vba VB6/VBScript 更改文件编码为ansi
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5182102/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
VB6/VBScript change file encoding to ansi
提问by HerbalMart
I am looking for a way to convert a textfile with UTF8 encoding to ANSI encoding.
我正在寻找一种将具有 UTF8 编码的文本文件转换为 ANSI 编码的方法。
How can i go around and achieve this in Visual Basic (VB6) and or vbscript?
我怎样才能在 Visual Basic (VB6) 和/或 vbscript 中解决这个问题?
回答by Bob77
If your files aren't truly enormous (e.g. even merely 40MB can be painfully slow) you can do this using the following code in VB6, VBA, or VBScript:
如果您的文件不是很大(例如,即使只有 40MB 也可能非常慢),您可以使用以下 VB6、VBA 或 VBScript 代码来完成此操作:
Option Explicit
Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0
Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName)
Dim strText
With CreateObject("ADODB.Stream")
.Open
.Type = adTypeBinary
.LoadFromFile UTF8FName
.Type = adTypeText
.Charset = "utf-8"
strText = .ReadText(adReadAll)
.Position = 0
.SetEOS
.Charset = "_autodetect" 'Use current ANSI codepage.
.WriteText strText, adWriteChar
.SaveToFile ANSIFName, adSaveCreateOverWrite
.Close
End With
End Sub
UTF8toANSI "UTF8-wBOM.txt", "ANSI1.txt"
UTF8toANSI "UTF8-noBOM.txt", "ANSI2.txt"
MsgBox "Complete!", vbOKOnly, WScript.ScriptName
Note that it will handle UTF-8 input files either with or without a BOM.
请注意,它将处理带有或不带有 BOM 的 UTF-8 输入文件。
Using strong typing and early binding will improve performance a hair in VB6, and you won't need to declare those Const values. This isn't an option in script though.
使用强类型和早期绑定将提高 VB6 中的性能,并且您不需要声明那些 Const 值。但这不是脚本中的选项。
For VB6 programs that need to process very large files you might be better off using VB6 native I/O against Byte arrays and use an API call to convert the data in chunks. This adds the extra messiness of finding the character boundaries though (UTF-8 uses a variable number of bytes per character). You'd need to scan each data block you read to find a safe ending point for an API translation.
对于需要处理非常大文件的 VB6 程序,您最好对 Byte 数组使用 VB6 本机 I/O,并使用 API 调用以块的形式转换数据。不过,这增加了查找字符边界的额外麻烦(UTF-8 使用每个字符的可变字节数)。您需要扫描您读取的每个数据块,以找到 API 转换的安全终点。
I'd look at MultiByteToWideChar() and WideCharToMultiByte() to get started.
我会看看 MultiByteToWideChar() 和 WideCharToMultiByte() 开始。
Note that UTF-8 often "arrives" with LF line delimiters instead of CRLF.
请注意,UTF-8 通常使用 LF 行分隔符而不是 CRLF“到达”。
回答by wqw
I'm using these helper functions
我正在使用这些辅助函数
Private Function pvReadFile(sFile)
Const ForReading = 1
Dim sPrefix
With CreateObject("Scripting.FileSystemObject")
sPrefix = .OpenTextFile(sFile, ForReading, False, False).Read(3)
End With
If Left(sPrefix, 3) <> Chr(&HEF) & Chr(&HBB) & Chr(&HBF) Then
With CreateObject("Scripting.FileSystemObject")
pvReadFile = .OpenTextFile(sFile, ForReading, False, Left(sPrefix, 2) = Chr(&HFF) & Chr(&HFE)).ReadAll()
End With
Else
With CreateObject("ADODB.Stream")
.Open
If Left(sPrefix, 2) = Chr(&HFF) & Chr(&HFE) Then
.Charset = "Unicode"
ElseIf Left(sPrefix, 3) = Chr(&HEF) & Chr(&HBB) & Chr(&HBF) Then
.Charset = "UTF-8"
Else
.Charset = "_autodetect"
End If
.LoadFromFile sFile
pvReadFile = .ReadText
End With
End If
End Function
Private Function pvWriteFile(sFile, sText, lType)
Const adSaveCreateOverWrite = 2
With CreateObject("ADODB.Stream")
.Open
If lType = 2 Then
.Charset = "Unicode"
ElseIf lType = 3 Then
.Charset = "UTF-8"
Else
.Charset = "_autodetect"
End If
.WriteText sText
.SaveToFile sFile, adSaveCreateOverWrite
End With
End Function
I found out that "native" FileSystemObject reading of ANSI and UTF-16/UCS-2 files is much faster that ADODB.Stream hack.
我发现 ANSI 和 UTF-16/UCS-2 文件的“本机” FileSystemObject 读取速度比 ADODB.Stream hack 快得多。
回答by Ciove
I'm using this script to convert any character set or code page (that i'm aware of).
我正在使用这个脚本来转换任何字符集或代码页(我知道)。
This script can also handle large files(over one gigabytes), because it streams one line at a time.
该脚本还可以处理大文件(超过 1 GB),因为它一次传输一行。
' - ConvertCharset.vbs -
'
' Inspired by:
' http://www.vbforums.com/showthread.php?533879-Generate-text-files-in-IBM-850-encoding
' http://stackoverflow.com/questions/5182102/vb6-vbscript-change-file-encoding-to-ansii/5186170#5186170
' http://stackoverflow.com/questions/13130214/how-to-convert-a-batch-file-stored-in-utf-8-to-something-that-works-via-another
'
' Start Main
Dim objArguments
Dim strSyntaxtext, strInputCharset, strOutputCharset, strInputFile, strOutputFile
Dim intReadPosition, intWritePosition
Dim arrCharsets
Const adReadAll = -1
Const adReadLine = -2
Const adSaveCreateOverWrite = 2
Const adSaveCreateNotExist = 1
Const adTypeBinary = 1
Const adTypeText = 2
Const adWriteChar = 0
Const adWriteLine = 1
strSyntaxtext = strSyntaxtext & "Converts the charset of the input text file to output file." & vbCrLf
strSyntaxtext = strSyntaxtext & "Syntax: " & vbCrLf
strSyntaxtext = strSyntaxtext & WScript.ScriptName & " /InputCharset:utf-8|windows-1252|ibm850|..." & vbCrLf
strSyntaxtext = strSyntaxtext & " /OutputCharset:utf-8|windows-1252|ibm850|..." & vbCrLf
strSyntaxtext = strSyntaxtext & " /InputFile:\path\to\inputfile.ext" & vbCrLf
strSyntaxtext = strSyntaxtext & " /OutputFile:\path\to\outputfile.ext" & vbCrLf
strSyntaxtext = strSyntaxtext & " [/ShowAllCharSets]" & vbCrLf & vbCrLf
strSyntaxtext = strSyntaxtext & "Example:" & vbCrLf
strSyntaxtext = strSyntaxtext & WScript.ScriptName & " /InputCharset:ibm850 /OutputCharset:utf-8 /InputFile:my_dos.txt /OutputFile:my_utf-8.txt" & vbCrLf
Set objArgumentsNamed = WScript.Arguments.Named
If objArgumentsNamed.Count = 0 Then
WScript.Echo strSyntaxtext
WScript.Quit(99)
End If
arrCharsets = Split("big5,big5-hkscs,euc-jp,euc-kr,gb18030,gb2312,gbk,ibm-thai," &_
"ibm00858,ibm01140,ibm01141,ibm01142,ibm01143,ibm01144," &_
"ibm01145,ibm01146,ibm01147,ibm01148,ibm01149,ibm037," &_
"ibm1026,ibm273,ibm277,ibm278,ibm280,ibm284,ibm285,ibm297," &_
"ibm420,ibm424,ibm437,ibm500,ibm775,ibm850,ibm852,ibm855," &_
"ibm857,ibm860,ibm861,ibm862,ibm863,ibm864,ibm865,ibm866," &_
"ibm869,ibm870,ibm871,iso-2022-jp,iso-2022-kr,iso-8859-1," &_
"iso-8859-13,iso-8859-15,iso-8859-2,iso-8859-3,iso-8859-4," &_
"iso-8859-5,iso-8859-6,iso-8859-7,iso-8859-8,iso-8859-9," &_
"koi8-r,koi8-u,shift_jis,tis-620,us-ascii,utf-16,utf-16be," &_
"utf-16le,utf-7,utf-8,windows-1250,windows-1251,windows-1252," &_
"windows-1253,windows-1254,windows-1255,windows-1256," &_
"windows-1257,windows-1258,unicode", ",")
Set objFileSystem = CreateObject("Scripting.FileSystemObject")
For Each objArgumentNamed in objArgumentsNamed
Select Case Lcase(objArgumentNamed)
Case "inputcharset"
strInputCharset = LCase(objArgumentsNamed(objArgumentNamed))
If Not IsCharset(strInputCharset) Then
WScript.Echo "The InputCharset (" & strInputCharset & ") is not valid, quitting. The valid charsets are:" & vbCrLf
x = ShowCharsets()
WScript.Quit(1)
End If
Case "outputcharset"
strOutputCharset = LCase(objArgumentsNamed(objArgumentNamed))
If Not IsCharset(strOutputCharset) Then
WScript.Echo "The strOutputCharset (" & strOutputCharset & ") is not valid, quitting. The valid charsets are:" & vbCrLf
x = ShowCharsets()
WScript.Quit(2)
End If
Case "inputfile"
strInputFile = LCase(objArgumentsNamed(objArgumentNamed))
If Not objFileSystem.FileExists(strInputFile) Then
WScript.Echo "The InputFile (" & strInputFile & ") does not exist, quitting." & vbCrLf
WScript.Quit(3)
End If
Case "outputfile"
strOutputFile = LCase(objArgumentsNamed(objArgumentNamed))
If objFileSystem.FileExists(strOutputFile) Then
WScript.Echo "The OutputFile (" & strOutputFile & ") exists, quitting." & vbCrLf
WScript.Quit(4)
End If
Case "showallcharsets"
x = ShowCharsets()
Case Else
WScript.Echo "Unknown parameter, quitting: /" & objArgumentNamed & ":" & objArgumentsNamed(objArgumentNamed)
WScript.Echo strSyntaxtext
End Select
Next
If Len(strInputCharset) > 0 And Len(strOutputCharset) > 0 And Len(strInputFile) > 0 And Len(strOutputFile) Then
Set objInputStream = CreateObject("ADODB.Stream")
Set objOutputStream = CreateObject("ADODB.Stream")
With objInputStream
.Open
.Type = adTypeBinary
.LoadFromFile strInputFile
.Type = adTypeText
.Charset = strInputCharset
intWritePosition = 0
objOutputStream.Open
objOutputStream.Charset = strOutputCharset
Do While .EOS <> True
strText = .ReadText(adReadLine)
objOutputStream.WriteText strText, adWriteLine
Loop
.Close
End With
objOutputStream.SaveToFile strOutputFile , adSaveCreateNotExist
objOutputStream.Close
WScript.Echo "The " & objFileSystem.GetFileName(strInputFile) & " was converted to " & objFileSystem.GetFileName(strOutputFile) & " OK."
End If
' End Main
' Start Functions
Function IsCharset(strMyCharset)
IsCharset = False
For Each strCharset in arrCharsets
If strCharset = strMyCharset Then
IsCharset = True
Exit For
End If
Next
End Function
Function ShowCharsets()
strDisplayCharsets = ""
intCounter = 0
For Each strcharset in arrCharsets
intCounter = intCounter + Len(strcharset) + 1
strDisplayCharsets = strDisplayCharsets & strcharset & ","
If intCounter > 67 Then
intCounter = 0
strDisplayCharsets = strDisplayCharsets & vbCrLf
End If
Next
strDisplayCharsets = Mid(strDisplayCharsets, 1, Len(strDisplayCharsets)-1)
WScript.Echo strDisplayCharsets
End Function
' End Functions
回答by nurettin
@Bob77's answer did not work for me, so I converted @Ciove's answer to a simple sub routine and it works fine.
@Bob77 的答案对我不起作用,所以我将 @Ciove 的答案转换为一个简单的子例程,它工作正常。
' Usage:
' EncodeFile strInFile, "UTF-8", strOutFile, "Windows-1254", 2
Sub EncodeFile(strInputFile, strInputCharset, strOutputFile, strOutputCharset, intOverwriteMode)
'5th parameter may take the following values:
'Const adSaveCreateOverWrite = 2
'Const adSaveCreateNotExist = 1
Const adReadLine = -2
Const adTypeBinary = 1
Const adTypeText = 2
Const adWriteLine = 1
Set objInputStream = CreateObject("ADODB.Stream")
Set objOutputStream = CreateObject("ADODB.Stream")
With objInputStream
.Open
.Type = adTypeBinary
.LoadFromFile strInputFile
.Type = adTypeText
.Charset = strInputCharset
objOutputStream.Open
objOutputStream.Charset = strOutputCharset
Do While .EOS <> True
strText = .ReadText(adReadLine)
objOutputStream.WriteText strText, adWriteLine
Loop
.Close
End With
objOutputStream.SaveToFile strOutputFile, intOverwriteMode
objOutputStream.Close
End Sub