VBA - 将字符串转换为 UNICODE
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23810324/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
VBA - Convert string to UNICODE
提问by Trenera
I need to convert the string HTML from a mix of Cyrillic and Latin symbols to UNICODE.
我需要将字符串 HTML 从西里尔字母和拉丁符号的混合转换为 UNICODE。
I tried the following:
我尝试了以下方法:
Public HTML As String
Sub HTMLsearch()
GetHTML ("http://nfs.mobile.bg/pcgi/mobile.cgi?act=3&slink=6jkjov&f1=1")
MsgBox HTML
HTML = StrConv(HTML, vbUnicode)
MsgBox HTML
End Sub
Function GetHTML(URL As String) As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.Send
HTML = .ResponseText
End With
End Function
You can see what is before and after the StrConv. If you like to get the html in a file, you can use the following code:
您可以看到 StrConv 之前和之后的内容。如果您想获取文件中的 html,可以使用以下代码:
Public HTML As String
Sub HTMLsearch()
GetHTML ("http://nfs.mobile.bg/pcgi/mobile.cgi?act=3&slink=6jkjov&f1=1")
Dim path As String
path = ThisWorkbook.path & "\html.txt"
Open path For Output As #1
Print #1, HTML
Close #1
HTML = StrConv(HTML, vbUnicode)
path = ThisWorkbook.path & "\htmlUNICODE.txt"
Open path For Output As #1
Print #1, HTML
Close #1
End Sub
Function GetHTML(URL As String) As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.Send
HTML = .ResponseText
End With
End Function
IDEAS?
想法?
回答by Tomalak
VBA's support for Unicode is not all that great.
VBA 对 Unicode 的支持并不是那么好。
It is possible to handle Unicode strings, but you will not be able to see the actual characters with Debug.Print
or MsgBox
- they will appear as ?
there.
可以处理 Unicode 字符串,但您将无法看到带有Debug.Print
或的实际字符MsgBox
- 它们将在?
那里显示。
You can set Control Panel > Region and Language > Administrative tab > "Current language for non-Unicode programs"to "Russian" switch to a different code page, which would allow you to see Cyrillic letters in VBA message boxes instead of question marks. But that's only a cosmetic change.
您可以将控制面板 > 区域和语言 > 管理选项卡 > “非 Unicode 程序的当前语言”设置为“俄语”切换到不同的代码页,这将允许您在 VBA 消息框中看到西里尔字母而不是问号。但这只是表面上的改变。
Your real problem is something else here.
你真正的问题是这里的其他问题。
The server (nfs.mobile.bg) sends the document as Content-Type: text/html
. There is no information about character encoding. That means the receiver must figure out character encoding on its own.
服务器 (nfs.mobile.bg) 将文档作为Content-Type: text/html
. 没有关于字符编码的信息。这意味着接收方必须自己弄清楚字符编码。
A browser does that by looking at the response byte stream and making guesses. In your case, a helpful <meta http-equiv="Content-Type" content="text/html; charset=windows-1251">
tag is present in the HMTL source. Therefore, the byte stream should be interpreted as Windows-1251
, which happens to be the Cyrillic ANSI code page in Windows.
浏览器通过查看响应字节流并进行猜测来做到这一点。在您的情况下,<meta http-equiv="Content-Type" content="text/html; charset=windows-1251">
HMTL 源中存在一个有用的标记。因此,字节流应该被解释为Windows-1251
,这恰好是 Windows 中的 Cyrillic ANSI 代码页。
So, we do not even have Unicode here!
所以,我们这里甚至没有 Unicode!
In the absence of any additional info, the responseText
property of the XMLHTTP
object defaults to us-ascii
. The extended characters from the Cyrillic alphabet are not present in ASCII, so they will be converted to actualquestion marks and are lost. That's why you can't use responseText
for anything.
在没有任何附加信息的情况下,对象的responseText
属性XMLHTTP
默认为us-ascii
。来自西里尔字母的扩展字符不存在于 ASCII 中,因此它们将被转换为实际的问号并丢失。这就是为什么你不能用responseText
任何东西。
However, the originalbytes of the response are still available, in the responseBody
property, which is an array of Byte
.
然而,响应的原始字节仍然可用,在responseBody
属性中,它是一个Byte
.
In VBA you must do the same thing a browser would do. You must interpret the byte-stream as a certain character set. The ADODB.Stream
object can do that for you, and it's pretty straight-forward, too:
在 VBA 中,您必须做与浏览器相同的事情。您必须将字节流解释为某个字符集。该ADODB.Stream
对象可以为您做到这一点,而且它也非常简单:
' reference: "Microsoft XML, v6.0" (or any other version)
' reference: "Microsoft ActiveX Data Objects 6.1 library" (or any other version)
Option Explicit
Sub HTMLsearch()
Dim url As String, html As String
url = "http://nfs.mobile.bg/pcgi/mobile.cgi?act=3&slink=6jkjov&f1=1"
html = GetHTML(url, "Windows-1251")
' Cyrillic characters are supported in Office, so they will appear correctly
ActiveDocument.Range.InsertAfter html
End Sub
Function GetHTML(Url As String, Optional Charset As String = "UTF-8") As String
Dim request As New MSXML2.XMLHTTP
Dim converter As New ADODB.stream
' fetch page
request.Open "GET", Url, False
request.send
' write raw bytes to the stream
converter.Open
converter.Type = adTypeBinary
converter.Write request.responseBody
' switch the stream to text mode and set charset
converter.Position = 0
converter.Type = adTypeText
converter.Charset = Charset
' read text characters from the stream, close the stream
GetHTML = converter.ReadText
converter.Close
End Function
I've been using MS Word here and calling HTMLsearch()
properly writes Cyrillic characters to the page. They still do appear as ?
in a MsgBox
for me, though, but now that's purely a display problem, caused by the fact that VBA-created UI cannot deal with Unicode.
我一直在这里使用 MS Word 并HTMLsearch()
正确调用将西里尔字符写入页面。不过,它们对我来说仍然显示为?
a MsgBox
,但现在这纯粹是一个显示问题,因为 VBA 创建的 UI 无法处理 Unicode。
回答by hamish
My production order data comes from many countries. this is the only vba function I could find that really works.
我的生产订单数据来自许多国家。这是我能找到的唯一真正有效的 vba 函数。
Private Const CP_UTF8 = 65001
Private Declare Function MultiByteToWideChar Lib "kernel32" ( _
ByVal CodePage As Long, ByVal dwFlags As Long, _
ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, _
ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
Public Function sUTF8ToUni(bySrc() As Byte) As String
' Converts a UTF-8 byte array to a Unicode string
Dim lBytes As Long, lNC As Long, lRet As Long
lBytes = UBound(bySrc) - LBound(bySrc) + 1
lNC = lBytes
sUTF8ToUni = String$(lNC, Chr(0))
lRet = MultiByteToWideChar(CP_UTF8, 0, VarPtr(bySrc(LBound(bySrc))), lBytes, StrPtr(sUTF8ToUni), lNC)
sUTF8ToUni = Left$(sUTF8ToUni, lRet)
End Function
Example Usage:
示例用法:
Dim sHTML As String
Dim bHTML() As Byte
bHTML = GetHTML("http://yoururlhere/myorderdata.php")
sHTML = sUTF8ToUni(bHTML)
sHTML = Mid(sHTML, 2) 'strip off Byte Order Mark: EF BB BF