VBA 使用 UTF-16 输出到文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9092548/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 15:06:47  来源:igfitidea点击:

VBA Output to file using UTF-16

xmlvbautf-16byte-order-mark

提问by Alex McMillan

I have a very complex problem that is difficult to explain properly. There is LOTS of discussion about this across the internet, but nothing definitive. Any help, or better explanation than mine, is greatlyappreciated.

我有一个非常复杂的问题,很难正确解释。互联网上有很多关于这个的讨论,但没有一个明确的。非常感谢任何帮助或比我更好的解释。

Essentially, I'm just trying to write an XML file using UTF-16 with VBA.

本质上,我只是想用 UTF-16 和 VBA 编写一个 XML 文件。

If I do this:

如果我这样做:

sXML = "<?xml version='1.0' encoding='utf-8'?>"
sXML = sXML & rest_of_xml_document
Print #iFile, sXML

then I get a file that is valid XML. However, if I change the "encoding=" to "utf-16", I get this error from my XML validator:

然后我得到一个有效的 XML 文件。但是,如果我将“编码=”更改为“utf-16”,我会从我的 XML 验证器收到此错误:

Switch from current encoding to specified encoding not supported.

Switch from current encoding to specified encoding not supported.

Googling tells me that this means the xml encoding attribute is different to the ACTUAL encoding used by the file, hence I must be creating a utf-8 document via Open and Print commands.

谷歌搜索告诉我这意味着 xml 编码属性与文件使用的实际编码不同,因此我必须通过打开和打印命令创建一个 utf-8 文档。

If I do something like:

如果我做这样的事情:

With CreateObject("ADODB.Stream")
  .Type = 2
  .Charset = "utf-16"
  .Open
  .WriteText sXML
  .SaveToFile sFilename, 2
  .Close
End With

then I end up with some funky characters (the BOM) at the beginning of my file which causes it to fail XML validation.

然后我最终在我的文件开头出现了一些时髦的字符(BOM),这导致它无法通过 XML 验证

If I open the file in Notepad++, delete the BOM and change the Encoding to "UCS-2", then the file validates fine with a "utf-16" encoding value (meaning that UCS-2 is close enough to UTF-16 that it doesnt matter, or that XML is able to Switch from current encodingbetween these two types.

如果我在 Notepad++ 中打开文件,删除 BOM 并将编码更改为“UCS-2”,那么文件将使用“utf-16”编码值进行验证(意味着 UCS-2 足够接近 UTF-16,没关系,或者 XML 能够Switch from current encoding介于这两种类型之间。

I need to use UTF-16 because UTF-8 doesn't cover all the characters used in the presentations I'm exporting.

我需要使用 UTF-16,因为 UTF-8 没有涵盖我导出的演示文稿中使用的所有字符。

The question:

问题:

How can I get VBA to behave like Notepad++, creating a UTF-16-encoded text file without a BOM that can be filled with XML data? ANY help much appreciated!

我怎样才能让 VBA 像 Notepad++ 一样运行,创建一个 UTF-16 编码的文本文件,而没有可以用 XML 数据填充的 BOM?非常感谢任何帮助!

回答by GSerg

Your point about UTF-8 not being able to store all characters you need is invalid.
UTF-8 is able to store every character defined in the Unicode standard.
The only difference is that, for text in certain languages, UTF-8 can take more space to store its codepoints than, say, UTF-16. The opposite is also true: for certain other languages, such as English, using UTF-8 savesspace.

您关于 UTF-8 无法存储您需要的所有字符的观点是无效的。
UTF-8 能够存储 Unicode 标准中定义的每个字符。
唯一的区别是,对于某些语言的文本,UTF-8 可以比 UTF-16 占用更多空间来存储其代码点。反之亦然:对于某些其他语言,例如英语,使用 UTF-8可以节省空间。

VB6 and VBA, although store strings in memory in Unicode, implicitly switch to ANSI (using the current system code page) when doing file IO. The resulting file you get is NOT in UTF-8. It is in your current system codepage, which, as you can discover in this helpful article, looks just like UTF-8 if you're from USA.

VB6 和 VBA,虽然在内存中以 Unicode 存储字符串,但在执行文件 IO 时隐式切换到 ANSI(使用当前系统代码页)。您得到的结果文件不是 UTF-8。它位于您当前的系统代码页中,正如您在这篇有用的文章 中发现的那样,如果您来自美国,它看起来就像 UTF-8。

Try:

尝试:

Dim s As String
s = "<?xml version='1.0' encoding='utf-16'?>"
s = s & ChrW$(&H43F&) & ChrW$(&H440&) & ChrW$(&H43E&) & ChrW$(&H432&) & ChrW$(&H435&) & ChrW$(&H440&) & ChrW$(&H43A&) & ChrW$(&H430&)

Dim b() As Byte
b = s

Open "Unicode.txt" For Binary Access Write As #1
Put #1, , b
Close #1


And if you absolutely must have UTF-8, you can make yourself some:

如果你绝对必须有 UTF-8,你可以自己做一些:

Option Explicit

Private Declare Function WideCharToMultiByte Lib "kernel32.dll" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByRef lpMultiByteStr As Byte, ByVal cchMultiByte As Long, ByVal lpDefaultChar As String, ByRef lpUsedDefaultChar As Long) As Long

Private Const CP_UTF8 As Long = 65001
Private Const ERROR_INSUFFICIENT_BUFFER As Long = 122&


Public Function ToUTF8(s As String) As Byte()

  If Len(s) = 0 Then Exit Function


  Dim ccb As Long
  ccb = WideCharToMultiByte(CP_UTF8, 0, StrPtr(s), Len(s), ByVal 0&, 0, vbNullString, ByVal 0&)

  If ccb = 0 Then
    Err.Raise 5, , "Internal error."
  End If

  Dim b() As Byte
  ReDim b(1 To ccb)

  If WideCharToMultiByte(CP_UTF8, 0, StrPtr(s), Len(s), b(LBound(b)), ccb, vbNullString, ByVal 0&) = 0 Then
    Err.Raise 5, , "Internal error."
  Else
    ToUTF8 = b
  End If

End Function
Sub Test()
  Dim s As String
  s = "<?xml version='1.0' encoding='utf-8'?>"
  s = s & ChrW$(&H43F&) & ChrW$(&H440&) & ChrW$(&H43E&) & ChrW$(&H432&) & ChrW$(&H435&) & ChrW$(&H440&) & ChrW$(&H43A&) & ChrW$(&H430&)

  Dim b() As Byte
  b = ToUTF8(s)

  Open "utf-8.txt" For Binary Access Write As #1
  Put #1, , b
  Close #1
End Sub