UTF-8 ArrayBuffer 和 String 之间的转换
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17191945/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Conversion between UTF-8 ArrayBuffer and String
提问by Tom Leese
I have an ArrayBuffer
which contains a string encoded using UTF-8 and I can't find a standard way of converting such ArrayBuffer
into a JS String
(which I understand is encoded using UTF-16).
我有一个ArrayBuffer
包含使用 UTF-8 编码的字符串,但我找不到将其转换ArrayBuffer
为 JS的标准方法String
(我理解它是使用 UTF-16 编码的)。
I've seen this code in numerous places, but I fail to see how it would work with any UTF-8 code points that are longer than 1 byte.
我在很多地方都看到过这段代码,但我看不出它如何处理任何长度超过 1 个字节的 UTF-8 代码点。
return String.fromCharCode.apply(null, new Uint8Array(data));
Similarly, I can't find a standard way of converting from a String
to a UTF-8 encoded ArrayBuffer
.
同样,我找不到从 aString
转换为 UTF-8 编码的标准方法ArrayBuffer
。
采纳答案by Niccolò Campolungo
function stringToUint(string) {
var string = btoa(unescape(encodeURIComponent(string))),
charList = string.split(''),
uintArray = [];
for (var i = 0; i < charList.length; i++) {
uintArray.push(charList[i].charCodeAt(0));
}
return new Uint8Array(uintArray);
}
function uintToString(uintArray) {
var encodedString = String.fromCharCode.apply(null, uintArray),
decodedString = decodeURIComponent(escape(atob(encodedString)));
return decodedString;
}
I have done, with some help from the internet, these little functions, they should solve your problems! Here is the working JSFiddle.
我已经完成了,借助互联网的一些帮助,这些小功能应该可以解决您的问题!这是工作中的 JSFiddle。
EDIT:
编辑:
Since the source of the Uint8Array is external and you can't use atob
you just need to remove it(working fiddle):
由于 Uint8Array 的来源是外部的,您不能使用,atob
您只需要删除它(工作小提琴):
function uintToString(uintArray) {
var encodedString = String.fromCharCode.apply(null, uintArray),
decodedString = decodeURIComponent(escape(encodedString));
return decodedString;
}
Warning: escape and unescape is removed from web standards.See this.
警告:escape 和 unescape 已从 Web 标准中删除。看到这个。
回答by PPB
Using TextEncoderand TextDecoder
var uint8array = new TextEncoder("utf-8").encode("Plain Text");
var string = new TextDecoder().decode(uint8array);
console.log(uint8array ,string )
回答by Albert
This should work:
这应该有效:
// http://www.onicos.com/staff/iz/amuse/javascript/expert/utf.txt
/* utf.js - UTF-8 <=> UTF-16 convertion
*
* Copyright (C) 1999 Masanao Izumo <[email protected]>
* Version: 1.0
* LastModified: Dec 25 1999
* This library is free. You can redistribute it and/or modify it.
*/
function Utf8ArrayToStr(array) {
var out, i, len, c;
var char2, char3;
out = "";
len = array.length;
i = 0;
while (i < len) {
c = array[i++];
switch (c >> 4)
{
case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
// 0xxxxxxx
out += String.fromCharCode(c);
break;
case 12: case 13:
// 110x xxxx 10xx xxxx
char2 = array[i++];
out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F));
break;
case 14:
// 1110 xxxx 10xx xxxx 10xx xxxx
char2 = array[i++];
char3 = array[i++];
out += String.fromCharCode(((c & 0x0F) << 12) |
((char2 & 0x3F) << 6) |
((char3 & 0x3F) << 0));
break;
}
}
return out;
}
It's somewhat cleaner as the other solutions because it doesn't use any hacks nor depends on Browser JS functions, e.g. works also in other JS environments.
它比其他解决方案更简洁,因为它不使用任何 hacks 也不依赖于浏览器 JS 功能,例如也适用于其他 JS 环境。
Check out the JSFiddle demo.
查看JSFiddle 演示。
回答by popham
There's a polyfill for Encodingover on Github: text-encoding. It's easy for Node or the browser, and the Readme advises the following:
Github 上有一个用于编码的polyfill:text-encoding。对于 Node 或浏览器来说很容易,自述文件建议如下:
var uint8array = TextEncoder(encoding).encode(string);
var string = TextDecoder(encoding).decode(uint8array);
If I recall, 'utf-8'
is the encoding
you need, and of course you'll need to wrap your buffer:
如果我记得,'utf-8'
是encoding
你需要的,当然你需要包装你的缓冲区:
var uint8array = new Uint8Array(utf8buffer);
Hope it works as well for you as it has for me.
希望它对你和我一样有效。
回答by Esailija
If you are doing this in browser there are no character encoding libraries built-in, but you can get by with:
如果您在浏览器中执行此操作,则没有内置字符编码库,但您可以通过:
function pad(n) {
return n.length < 2 ? "0" + n : n;
}
var array = new Uint8Array(data);
var str = "";
for( var i = 0, len = array.length; i < len; ++i ) {
str += ( "%" + pad(array[i].toString(16)))
}
str = decodeURIComponent(str);
Here's a demo that decodes a 3-byte UTF-8 unit: http://jsfiddle.net/Z9pQE/
这是一个解码 3 字节 UTF-8 单元的演示:http: //jsfiddle.net/Z9pQE/
回答by Martin Wantke
The methods readAsArrayBufferand readAsTextfrom a FileReaderobject converts a Blob object to an ArrayBuffer or to a DOMString asynchronous.
方法readAsArrayBuffer和readAsText从一个的FileReader对象Blob对象转换为ArrayBuffer或一个DOMString异步的。
A Blob object type can be created from a raw text or byte array, for example.
例如,可以从原始文本或字节数组创建 Blob 对象类型。
let blob = new Blob([text], { type: "text/plain" });
let reader = new FileReader();
reader.onload = event =>
{
let buffer = event.target.result;
};
reader.readAsArrayBuffer(blob);
I think it's better to pack up this in a promise:
我认为最好将其打包成一个承诺:
function textToByteArray(text)
{
let blob = new Blob([text], { type: "text/plain" });
let reader = new FileReader();
let done = function() { };
reader.onload = event =>
{
done(new Uint8Array(event.target.result));
};
reader.readAsArrayBuffer(blob);
return { done: function(callback) { done = callback; } }
}
function byteArrayToText(bytes, encoding)
{
let blob = new Blob([bytes], { type: "application/octet-stream" });
let reader = new FileReader();
let done = function() { };
reader.onload = event =>
{
done(event.target.result);
};
if(encoding) { reader.readAsText(blob, encoding); } else { reader.readAsText(blob); }
return { done: function(callback) { done = callback; } }
}
let text = "\uD83D\uDCA9 = \u2661";
textToByteArray(text).done(bytes =>
{
console.log(bytes);
byteArrayToText(bytes, 'UTF-8').done(text =>
{
console.log(text); // = ?
});
});
回答by Rosberg Linhares
If you don't want to use any external polyfill library, you can use this function provided by the Mozilla Developer Network website:
如果你不想使用任何外部 polyfill 库,你可以使用Mozilla 开发者网络网站提供的这个功能:
function utf8ArrayToString(aBytes) {
var sView = "";
for (var nPart, nLen = aBytes.length, nIdx = 0; nIdx < nLen; nIdx++) {
nPart = aBytes[nIdx];
sView += String.fromCharCode(
nPart > 251 && nPart < 254 && nIdx + 5 < nLen ? /* six bytes */
/* (nPart - 252 << 30) may be not so safe in ECMAScript! So...: */
(nPart - 252) * 1073741824 + (aBytes[++nIdx] - 128 << 24) + (aBytes[++nIdx] - 128 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
: nPart > 247 && nPart < 252 && nIdx + 4 < nLen ? /* five bytes */
(nPart - 248 << 24) + (aBytes[++nIdx] - 128 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
: nPart > 239 && nPart < 248 && nIdx + 3 < nLen ? /* four bytes */
(nPart - 240 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
: nPart > 223 && nPart < 240 && nIdx + 2 < nLen ? /* three bytes */
(nPart - 224 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
: nPart > 191 && nPart < 224 && nIdx + 1 < nLen ? /* two bytes */
(nPart - 192 << 6) + aBytes[++nIdx] - 128
: /* nPart < 127 ? */ /* one byte */
nPart
);
}
return sView;
}
let str = utf8ArrayToString([50,72,226,130,130,32,43,32,79,226,130,130,32,226,135,140,32,50,72,226,130,130,79]);
// Must show 2H? + O? ? 2H?O
console.log(str);
回答by Tchakabam
The latest answers to these type of questions (using nowadays methods) is here: Converting between strings and ArrayBuffers
这些类型问题的最新答案(使用当今的方法)在这里:在字符串和 ArrayBuffers 之间转换
回答by konak
The main problem of programmers looking for conversion from byte array into a string is UTF-8 encoding (compression) of unicode characters. This code will help you:
寻求从字节数组转换为字符串的程序员的主要问题是 Unicode 字符的 UTF-8 编码(压缩)。此代码将帮助您:
var getString = function (strBytes) {
var MAX_SIZE = 0x4000;
var codeUnits = [];
var highSurrogate;
var lowSurrogate;
var index = -1;
var result = '';
while (++index < strBytes.length) {
var codePoint = Number(strBytes[index]);
if (codePoint === (codePoint & 0x7F)) {
} else if (0xF0 === (codePoint & 0xF0)) {
codePoint ^= 0xF0;
codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
} else if (0xE0 === (codePoint & 0xE0)) {
codePoint ^= 0xE0;
codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
} else if (0xC0 === (codePoint & 0xC0)) {
codePoint ^= 0xC0;
codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
}
if (!isFinite(codePoint) || codePoint < 0 || codePoint > 0x10FFFF || Math.floor(codePoint) != codePoint)
throw RangeError('Invalid code point: ' + codePoint);
if (codePoint <= 0xFFFF)
codeUnits.push(codePoint);
else {
codePoint -= 0x10000;
highSurrogate = (codePoint >> 10) | 0xD800;
lowSurrogate = (codePoint % 0x400) | 0xDC00;
codeUnits.push(highSurrogate, lowSurrogate);
}
if (index + 1 == strBytes.length || codeUnits.length > MAX_SIZE) {
result += String.fromCharCode.apply(null, codeUnits);
codeUnits.length = 0;
}
}
return result;
}
All the best !
祝一切顺利 !