vba 将excel分析保存为UTF-8文本文件的VBA宏
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28197246/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
VBA Macro to save excel analysis as UTF-8 textfile
提问by Stephane
I have an XLS file containing some foreign characters. I have tried the following approach, but the data looks funky:
我有一个包含一些外来字符的 XLS 文件。我尝试了以下方法,但数据看起来很奇怪:
--- removed code above
wb.SaveAs fPath & Replace(fName, ".xlsx", ".txt"), FileFormat:=xlUnicodeText, CreateBackup:=False
wb.Saved = True
wb.Close True
ActiveWorkbook.Close
tFileToOpenPath = fPath & Replace(fName, ".xlsx", ".txt")
tFileToSavePath = fPath & Replace(fName, ".xlsx", "-UTF8.txt")
Dim oStream
Set oStream = CreateObject("ADODB.Stream") 'Create Stream object
With oStream
.Type = 2 'Specify stream type – we want To save text/string data.
.CharSet = "utf-8" 'Specify charset For the source text data.
.Open 'Open the stream
.LoadFromFile tFileToOpenPath 'And write the file to the object stream
.SaveToFile tFileToSavePath, 2 'Save the data to the named path
End With
Set oStream = Nothing 'Close the stream - no memory leaks
The output data looks like that... I'm puzzled
输出数据看起来是这样的......我很困惑
S a l e s
The txt file itself is fine, just the UTF-8.txt one is really weird
txt 文件本身没问题,只是 UTF-8.txt 一个真的很奇怪
Thanks!
谢谢!
回答by KekuSemau
Here's what happens:wb.SaveAs
saves your workbook as xlUnicodeText
. This is in fact UTF16LE (which is mostly meant when anyone just says 'Unicode').
会发生以下情况:wb.SaveAs
将您的工作簿另存为xlUnicodeText
. 这实际上是 UTF16LE(这主要是指任何人只说“Unicode”)。
Your Stream Object opens this UTF16 source, but reads it as UTF8. In UTF16, characters take up a minimum of two bytes. In UTF8, simple latin characters only use one byte, hence an empty byte after most characters (displayed in your case as spaces; notepad++ would display NUL).
您的 Stream 对象打开此 UTF16 源,但将其读取为 UTF8。在 UTF16 中,字符最少占用两个字节。在 UTF8 中,简单的拉丁字符仅使用一个字节,因此在大多数字符之后有一个空字节(在您的情况下显示为空格;notepad++ 将显示NUL)。
This seems to work for a conversion:
这似乎适用于转换:
Public Sub convert_UnicodeToUTF8(parF1 As String, parF2 As String)
Const adSaveCreateOverWrite = 2
Const adTypeText = 2
Dim streamSrc, streamDst ' Source / Destination
Set streamSrc = CreateObject("ADODB.Stream")
Set streamDst = CreateObject("ADODB.Stream")
streamDst.Type = adTypeText
streamDst.Charset = "utf-8"
streamDst.Open
With streamSrc
.Type = adTypeText
.Charset = "Unicode" ' this is also the default value
.Open
.LoadFromFile parF1
.copyTo streamDst
.Close
End With
streamDst.saveToFile parF2, adSaveCreateOverWrite
streamDst.Close
Set streamSrc = Nothing
Set streamDst = Nothing
End Sub
See ADODB.Stream.CopyTo.
the CharSet property of the destination Stream object can be different than the source Stream object`
目标 Stream 对象的 CharSet 属性可以不同于源 Stream 对象`