JavaScript 中 UTF-16 到 UTF-8 的转换
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14592364/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UTF-16 to UTF-8 conversion in JavaScript
提问by Don P
I have Base64 encoded data that is in UTF-16 I am trying to decode the data but most libraries only support UTF-8. I believe I have to drop the null bites but I am unsure how.
我有 UTF-16 的 Base64 编码数据我正在尝试解码数据,但大多数库只支持 UTF-8。我相信我必须放弃空咬,但我不确定如何。
Currently I am using David Chambbers Polyfillfor Base64, but I have also tried other libraries such as phpjs.org, none of which support UTF-16.
目前我正在使用David Chambbers Polyfillfor Base64,但我也尝试过其他库,例如phpjs.org,它们都不支持 UTF-16。
One thing to point out is on Chrome the atob method works with out problem, Firefox I get results described here, and in IE I am only returned the first character.
需要指出的一件事是,在 Chrome 上,atob 方法可以正常工作,在 Firefox 中我得到了此处描述的结果,而在 IE 中,我只返回了第一个字符。
Any help is greatly appreciated
任何帮助是极大的赞赏
回答by Esailija
You want to decode UTF-16, not convert to UTF-8. Decoding means that the result is a string of abstract characters. Of course there is an internal encoding for strings as well, UTF-16 or UCS-2 in javascript, but that's an implementation detail.
您想解码 UTF-16,而不是转换为 UTF-8。解码意味着结果是一串抽象字符。当然,字符串也有内部编码,javascript 中的 UTF-16 或 UCS-2,但这是一个实现细节。
With strings the goal is that you don't have to worry about encodings but just about manipulating characters "as they are". So you can write string methods that don't need to decode input at all. Of course there are many edge cases where this falls apart.
使用字符串的目标是您不必担心编码,而只需“按原样”操作字符。因此,您可以编写根本不需要解码输入的字符串方法。当然,有很多边缘情况会导致这种情况分崩离析。
You cannot decode utf-16 just by removing nulls. I mean this will work fine for the first 256 code points of unicode, but you will get garbage when any of the other ~110000 characters in unicode are used. You cannot even get the most popular non-ASCII characters like em dash or any smart quotes working.
您不能仅通过删除空值来解码 utf-16。我的意思是这对于 unicode 的前 256 个代码点可以正常工作,但是当使用 unicode 中的任何其他 ~110000 个字符时,您将得到垃圾。您甚至无法获得最流行的非 ASCII 字符,例如长破折号或任何智能引号。
Also, looking at your example, it looks like UTF-16LE.
另外,看看你的例子,它看起来像 UTF-16LE。
//Braindead decoder that assumes fully valid input
function decodeUTF16LE( binaryStr ) {
var cp = [];
for( var i = 0; i < binaryStr.length; i+=2) {
cp.push(
binaryStr.charCodeAt(i) |
( binaryStr.charCodeAt(i+1) << 8 )
);
}
return String.fromCharCode.apply( String, cp );
}
var base64decode = atob; //In chrome and firefox, atob is a native method available for base64 decoding
var base64 = "VABlAHMAdABpAG4AZwA";
var binaryStr = base64decode(base64);
var result = decodeUTF16LE(binaryStr);
Now you can even get smart quotes working:
现在,您甚至可以使用智能报价:
var base64 = "HCBoAGUAbABsAG8AHSA="
var binaryStr = base64decode(base64);
var result = decodeUTF16LE(binaryStr);
//"“hello”"