vba 将excel分析保存为UTF-8文本文件的VBA宏

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28197246/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-12 06:00:21  来源:igfitidea点击:

VBA Macro to save excel analysis as UTF-8 textfile

excelvbaexcel-vbautf-8

提问by Stephane

I have an XLS file containing some foreign characters. I have tried the following approach, but the data looks funky:

我有一个包含一些外来字符的 XLS 文件。我尝试了以下方法,但数据看起来很奇怪:

--- removed code above
wb.SaveAs fPath & Replace(fName, ".xlsx", ".txt"), FileFormat:=xlUnicodeText, CreateBackup:=False
wb.Saved = True
wb.Close True
ActiveWorkbook.Close


tFileToOpenPath = fPath & Replace(fName, ".xlsx", ".txt")
tFileToSavePath = fPath & Replace(fName, ".xlsx", "-UTF8.txt")

Dim oStream
Set oStream = CreateObject("ADODB.Stream") 'Create Stream object 
With oStream
    .Type = 2 'Specify stream type – we want To save text/string data. 
    .CharSet = "utf-8" 'Specify charset For the source text data. 
    .Open 'Open the stream 
    .LoadFromFile tFileToOpenPath  'And write the file to the object stream 
    .SaveToFile tFileToSavePath, 2  'Save the data to the named path 
End With
Set oStream = Nothing 'Close the stream - no memory leaks

The output data looks like that... I'm puzzled

输出数据看起来是这样的......我很困惑

S a l e s 

The txt file itself is fine, just the UTF-8.txt one is really weird

txt 文件本身没问题,只是 UTF-8.txt 一个真的很奇怪

Thanks!

谢谢!

回答by KekuSemau

Here's what happens:
wb.SaveAssaves your workbook as xlUnicodeText. This is in fact UTF16LE (which is mostly meant when anyone just says 'Unicode').

会发生以下情况:
wb.SaveAs将您的工作簿另存为xlUnicodeText. 这实际上是 UTF16LE(这主要是指任何人只说“Unicode”)。

Your Stream Object opens this UTF16 source, but reads it as UTF8. In UTF16, characters take up a minimum of two bytes. In UTF8, simple latin characters only use one byte, hence an empty byte after most characters (displayed in your case as spaces; notepad++ would display NUL).

您的 Stream 对象打开此 UTF16 源,但将其读取为 UTF8。在 UTF16 中,字符最少占用两个字节。在 UTF8 中,简单的拉丁字符仅使用一个字节,因此在大多数字符之后有一个空字节(在您的情况下显示为空格;notepad++ 将显示NUL)。

This seems to work for a conversion:

这似乎适用于转换:

Public Sub convert_UnicodeToUTF8(parF1 As String, parF2 As String)
    Const adSaveCreateOverWrite = 2
    Const adTypeText = 2

    Dim streamSrc, streamDst ' Source / Destination
    Set streamSrc = CreateObject("ADODB.Stream")
    Set streamDst = CreateObject("ADODB.Stream")
    streamDst.Type = adTypeText
    streamDst.Charset = "utf-8"
    streamDst.Open

    With streamSrc
        .Type = adTypeText
        .Charset = "Unicode" ' this is also the default value
        .Open
        .LoadFromFile parF1
        .copyTo streamDst
        .Close
    End With
    streamDst.saveToFile parF2, adSaveCreateOverWrite
    streamDst.Close
    Set streamSrc = Nothing
    Set streamDst = Nothing

End Sub

See ADODB.Stream.CopyTo.

请参阅ADODB.Stream.CopyTo

the CharSet property of the destination Stream object can be different than the source Stream object`

目标 Stream 对象的 CharSet 属性可以不同于源 Stream 对象`