vba VB6/VBScript 更改文件编码为ansi

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5182102/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 11:11:58  来源:igfitidea点击:

VB6/VBScript change file encoding to ansi

vbaencodingvbscriptvb6ansi

提问by HerbalMart

I am looking for a way to convert a textfile with UTF8 encoding to ANSI encoding.

我正在寻找一种将具有 UTF8 编码的文本文件转换为 ANSI 编码的方法。

How can i go around and achieve this in Visual Basic (VB6) and or vbscript?

我怎样才能在 Visual Basic (VB6) 和/或 vbscript 中解决这个问题?

回答by Bob77

If your files aren't truly enormous (e.g. even merely 40MB can be painfully slow) you can do this using the following code in VB6, VBA, or VBScript:

如果您的文件不是很大(例如,即使只有 40MB 也可能非常慢),您可以使用以下 VB6、VBA 或 VBScript 代码来完成此操作:

Option Explicit

Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0

Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName)
    Dim strText

    With CreateObject("ADODB.Stream")
        .Open
        .Type = adTypeBinary
        .LoadFromFile UTF8FName
        .Type = adTypeText
        .Charset = "utf-8"
        strText = .ReadText(adReadAll)
        .Position = 0
        .SetEOS
        .Charset = "_autodetect" 'Use current ANSI codepage.
        .WriteText strText, adWriteChar
        .SaveToFile ANSIFName, adSaveCreateOverWrite
        .Close
    End With
End Sub

UTF8toANSI "UTF8-wBOM.txt", "ANSI1.txt"
UTF8toANSI "UTF8-noBOM.txt", "ANSI2.txt"
MsgBox "Complete!", vbOKOnly, WScript.ScriptName

Note that it will handle UTF-8 input files either with or without a BOM.

请注意,它将处理带有或不带有 BOM 的 UTF-8 输入文件。

Using strong typing and early binding will improve performance a hair in VB6, and you won't need to declare those Const values. This isn't an option in script though.

使用强类型和早期绑定将提高 VB6 中的性能,并且您不需要声明那些 Const 值。但这不是脚本中的选项。

For VB6 programs that need to process very large files you might be better off using VB6 native I/O against Byte arrays and use an API call to convert the data in chunks. This adds the extra messiness of finding the character boundaries though (UTF-8 uses a variable number of bytes per character). You'd need to scan each data block you read to find a safe ending point for an API translation.

对于需要处理非常大文件的 VB6 程序,您最好对 Byte 数组使用 VB6 本机 I/O,并使用 API 调用以块的形式转换数据。不过,这增加了查找字符边界的额外麻烦(UTF-8 使用每个字符的可变字节数)。您需要扫描您读取的每个数据块,以找到 API 转换的安全终点。

I'd look at MultiByteToWideChar() and WideCharToMultiByte() to get started.

我会看看 MultiByteToWideChar() 和 WideCharToMultiByte() 开始。

Note that UTF-8 often "arrives" with LF line delimiters instead of CRLF.

请注意,UTF-8 通常使用 LF 行分隔符而不是 CRLF“到达”。

回答by wqw

I'm using these helper functions

我正在使用这些辅助函数

Private Function pvReadFile(sFile)
    Const ForReading = 1
    Dim sPrefix

    With CreateObject("Scripting.FileSystemObject")
        sPrefix = .OpenTextFile(sFile, ForReading, False, False).Read(3)
    End With
    If Left(sPrefix, 3) <> Chr(&HEF) & Chr(&HBB) & Chr(&HBF) Then
        With CreateObject("Scripting.FileSystemObject")
            pvReadFile = .OpenTextFile(sFile, ForReading, False, Left(sPrefix, 2) = Chr(&HFF) & Chr(&HFE)).ReadAll()
        End With
    Else
        With CreateObject("ADODB.Stream")
            .Open
            If Left(sPrefix, 2) = Chr(&HFF) & Chr(&HFE) Then
                .Charset = "Unicode"
            ElseIf Left(sPrefix, 3) = Chr(&HEF) & Chr(&HBB) & Chr(&HBF) Then
                .Charset = "UTF-8"
            Else
                .Charset = "_autodetect"
            End If
            .LoadFromFile sFile
            pvReadFile = .ReadText
        End With
    End If
End Function

Private Function pvWriteFile(sFile, sText, lType)
    Const adSaveCreateOverWrite = 2

    With CreateObject("ADODB.Stream")
        .Open
        If lType = 2 Then
            .Charset = "Unicode"
        ElseIf lType = 3 Then
            .Charset = "UTF-8"
        Else
            .Charset = "_autodetect"
        End If
        .WriteText sText
        .SaveToFile sFile, adSaveCreateOverWrite
    End With
End Function

I found out that "native" FileSystemObject reading of ANSI and UTF-16/UCS-2 files is much faster that ADODB.Stream hack.

我发现 ANSI 和 UTF-16/UCS-2 文件的“本机” FileSystemObject 读取速度比 ADODB.Stream hack 快得多。

回答by Ciove

I'm using this script to convert any character set or code page (that i'm aware of).

我正在使用这个脚本来转换任何字符集或代码页(我知道)。

This script can also handle large files(over one gigabytes), because it streams one line at a time.

该脚本还可以处理大文件(超过 1 GB),因为它一次传输一行。

' - ConvertCharset.vbs -
'
' Inspired by: 
' http://www.vbforums.com/showthread.php?533879-Generate-text-files-in-IBM-850-encoding
' http://stackoverflow.com/questions/5182102/vb6-vbscript-change-file-encoding-to-ansii/5186170#5186170
' http://stackoverflow.com/questions/13130214/how-to-convert-a-batch-file-stored-in-utf-8-to-something-that-works-via-another
' 
' Start Main
Dim objArguments
Dim strSyntaxtext, strInputCharset, strOutputCharset, strInputFile, strOutputFile 
Dim intReadPosition, intWritePosition
Dim arrCharsets

Const adReadAll = -1
Const adReadLine = -2
Const adSaveCreateOverWrite = 2
Const adSaveCreateNotExist = 1
Const adTypeBinary = 1
Const adTypeText = 2
Const adWriteChar = 0
Const adWriteLine = 1

strSyntaxtext = strSyntaxtext & "Converts the charset of the input text file to output file." & vbCrLf
strSyntaxtext = strSyntaxtext & "Syntax: "  & vbCrLf
strSyntaxtext = strSyntaxtext & WScript.ScriptName & " /InputCharset:utf-8|windows-1252|ibm850|..." & vbCrLf
strSyntaxtext = strSyntaxtext & "              /OutputCharset:utf-8|windows-1252|ibm850|..." & vbCrLf 
strSyntaxtext = strSyntaxtext & "              /InputFile:\path\to\inputfile.ext" & vbCrLf 
strSyntaxtext = strSyntaxtext & "              /OutputFile:\path\to\outputfile.ext" & vbCrLf 
strSyntaxtext = strSyntaxtext & "              [/ShowAllCharSets]" & vbCrLf & vbCrLf 
strSyntaxtext = strSyntaxtext & "Example:" & vbCrLf
strSyntaxtext = strSyntaxtext & WScript.ScriptName & " /InputCharset:ibm850 /OutputCharset:utf-8 /InputFile:my_dos.txt /OutputFile:my_utf-8.txt" & vbCrLf

Set objArgumentsNamed = WScript.Arguments.Named
If objArgumentsNamed.Count = 0  Then 
   WScript.Echo strSyntaxtext
   WScript.Quit(99)
End If

arrCharsets = Split("big5,big5-hkscs,euc-jp,euc-kr,gb18030,gb2312,gbk,ibm-thai," &_
                    "ibm00858,ibm01140,ibm01141,ibm01142,ibm01143,ibm01144," &_
                    "ibm01145,ibm01146,ibm01147,ibm01148,ibm01149,ibm037," &_
                    "ibm1026,ibm273,ibm277,ibm278,ibm280,ibm284,ibm285,ibm297," &_
                    "ibm420,ibm424,ibm437,ibm500,ibm775,ibm850,ibm852,ibm855," &_
                    "ibm857,ibm860,ibm861,ibm862,ibm863,ibm864,ibm865,ibm866," &_
                    "ibm869,ibm870,ibm871,iso-2022-jp,iso-2022-kr,iso-8859-1," &_
                    "iso-8859-13,iso-8859-15,iso-8859-2,iso-8859-3,iso-8859-4," &_
                    "iso-8859-5,iso-8859-6,iso-8859-7,iso-8859-8,iso-8859-9," &_
                    "koi8-r,koi8-u,shift_jis,tis-620,us-ascii,utf-16,utf-16be," &_
                    "utf-16le,utf-7,utf-8,windows-1250,windows-1251,windows-1252," &_
                    "windows-1253,windows-1254,windows-1255,windows-1256," &_
                    "windows-1257,windows-1258,unicode", ",")

Set objFileSystem = CreateObject("Scripting.FileSystemObject")

For Each objArgumentNamed in objArgumentsNamed
   Select Case Lcase(objArgumentNamed)
      Case "inputcharset"
         strInputCharset = LCase(objArgumentsNamed(objArgumentNamed))
         If Not IsCharset(strInputCharset) Then 
            WScript.Echo "The InputCharset (" & strInputCharset & ") is not valid, quitting. The valid charsets are:"  & vbCrLf
            x = ShowCharsets()
            WScript.Quit(1)
         End If
      Case "outputcharset"
         strOutputCharset = LCase(objArgumentsNamed(objArgumentNamed))
         If Not IsCharset(strOutputCharset) Then 
            WScript.Echo "The strOutputCharset (" & strOutputCharset & ") is not valid, quitting. The valid charsets are:"  & vbCrLf
            x = ShowCharsets()
            WScript.Quit(2)
         End If
      Case "inputfile"
         strInputFile = LCase(objArgumentsNamed(objArgumentNamed))
         If Not objFileSystem.FileExists(strInputFile) Then  
            WScript.Echo "The InputFile (" & strInputFile  & ") does not exist, quitting."  & vbCrLf
            WScript.Quit(3)
         End If
      Case "outputfile"
         strOutputFile = LCase(objArgumentsNamed(objArgumentNamed))
         If objFileSystem.FileExists(strOutputFile) Then  
            WScript.Echo "The OutputFile  (" & strOutputFile & ") exists, quitting."  & vbCrLf
            WScript.Quit(4)
         End If
      Case "showallcharsets"
         x = ShowCharsets()
      Case Else
         WScript.Echo "Unknown parameter, quitting: /" & objArgumentNamed & ":" & objArgumentsNamed(objArgumentNamed)
         WScript.Echo strSyntaxtext
   End Select 
Next

If Len(strInputCharset) > 0 And Len(strOutputCharset) > 0 And Len(strInputFile) > 0 And Len(strOutputFile) Then 
   Set objInputStream = CreateObject("ADODB.Stream")
   Set objOutputStream = CreateObject("ADODB.Stream")

   With objInputStream
      .Open
      .Type = adTypeBinary
      .LoadFromFile strInputFile
      .Type = adTypeText
      .Charset = strInputCharset
      intWritePosition = 0
      objOutputStream.Open
      objOutputStream.Charset = strOutputCharset
      Do While .EOS <> True
         strText = .ReadText(adReadLine)
         objOutputStream.WriteText strText, adWriteLine
      Loop
      .Close
   End With
   objOutputStream.SaveToFile strOutputFile , adSaveCreateNotExist
   objOutputStream.Close
   WScript.Echo "The " & objFileSystem.GetFileName(strInputFile) & " was converted to "  & objFileSystem.GetFileName(strOutputFile) & " OK."
End If
' End Main

' Start Functions 

Function IsCharset(strMyCharset)
IsCharset = False
For Each strCharset in arrCharsets
   If strCharset = strMyCharset Then 
      IsCharset = True
      Exit For
   End If
Next
End Function 

Function ShowCharsets()
strDisplayCharsets = ""
intCounter = 0
For Each strcharset in arrCharsets
   intCounter = intCounter + Len(strcharset) + 1
   strDisplayCharsets = strDisplayCharsets & strcharset & ","
   If intCounter > 67 Then 
      intCounter = 0
      strDisplayCharsets = strDisplayCharsets & vbCrLf 
   End If
Next
strDisplayCharsets = Mid(strDisplayCharsets, 1, Len(strDisplayCharsets)-1)
WScript.Echo strDisplayCharsets 
End Function 
' End Functions 

回答by nurettin

@Bob77's answer did not work for me, so I converted @Ciove's answer to a simple sub routine and it works fine.

@Bob77 的答案对我不起作用,所以我将 @Ciove 的答案转换为一个简单的子例程,它工作正常。

' Usage: 
' EncodeFile strInFile, "UTF-8", strOutFile, "Windows-1254", 2
Sub EncodeFile(strInputFile, strInputCharset, strOutputFile, strOutputCharset, intOverwriteMode)

    '5th parameter may take the following values:
    'Const adSaveCreateOverWrite = 2
    'Const adSaveCreateNotExist = 1

    Const adReadLine = -2
    Const adTypeBinary = 1
    Const adTypeText = 2
    Const adWriteLine = 1

    Set objInputStream = CreateObject("ADODB.Stream")
    Set objOutputStream = CreateObject("ADODB.Stream")

    With objInputStream
      .Open
      .Type = adTypeBinary
      .LoadFromFile strInputFile
      .Type = adTypeText
      .Charset = strInputCharset
      objOutputStream.Open
      objOutputStream.Charset = strOutputCharset
      Do While .EOS <> True
         strText = .ReadText(adReadLine)
         objOutputStream.WriteText strText, adWriteLine
      Loop
      .Close
    End With
    objOutputStream.SaveToFile strOutputFile, intOverwriteMode
    objOutputStream.Close
End Sub