UTF-8 ArrayBuffer 和 String 之间的转换

Question

提问by Tom Leese

I have an ArrayBufferwhich contains a string encoded using UTF-8 and I can't find a standard way of converting such ArrayBufferinto a JS String(which I understand is encoded using UTF-16).

我有一个ArrayBuffer包含使用 UTF-8 编码的字符串，但我找不到将其转换ArrayBuffer为 JS的标准方法String（我理解它是使用 UTF-16 编码的）。

I've seen this code in numerous places, but I fail to see how it would work with any UTF-8 code points that are longer than 1 byte.

我在很多地方都看到过这段代码，但我看不出它如何处理任何长度超过 1 个字节的 UTF-8 代码点。

return String.fromCharCode.apply(null, new Uint8Array(data));

Similarly, I can't find a standard way of converting from a Stringto a UTF-8 encoded ArrayBuffer.

同样，我找不到从 aString转换为 UTF-8 编码的标准方法ArrayBuffer。

Answer 1

采纳答案by Niccolò Campolungo

function stringToUint(string) {
    var string = btoa(unescape(encodeURIComponent(string))),
        charList = string.split(''),
        uintArray = [];
    for (var i = 0; i < charList.length; i++) {
        uintArray.push(charList[i].charCodeAt(0));
    }
    return new Uint8Array(uintArray);
}

function uintToString(uintArray) {
    var encodedString = String.fromCharCode.apply(null, uintArray),
        decodedString = decodeURIComponent(escape(atob(encodedString)));
    return decodedString;
}

I have done, with some help from the internet, these little functions, they should solve your problems! Here is the working JSFiddle.

我已经完成了，借助互联网的一些帮助，这些小功能应该可以解决您的问题！这是工作中的 JSFiddle。

EDIT:

编辑：

Since the source of the Uint8Array is external and you can't use atobyou just need to remove it(working fiddle):

由于 Uint8Array 的来源是外部的，您不能使用，atob您只需要删除它（工作小提琴）：

function uintToString(uintArray) {
    var encodedString = String.fromCharCode.apply(null, uintArray),
        decodedString = decodeURIComponent(escape(encodedString));
    return decodedString;
}

Warning: escape and unescape is removed from web standards.See this.

警告：escape 和 unescape 已从 Web 标准中删除。看到这个。

Answer 2

回答by PPB

Using TextEncoderand TextDecoder

使用TextEncoder和TextDecoder

var uint8array = new TextEncoder("utf-8").encode("Plain Text");
var string = new TextDecoder().decode(uint8array);
console.log(uint8array ,string )

Answer 3

回答by Albert

This should work:

这应该有效：

// http://www.onicos.com/staff/iz/amuse/javascript/expert/utf.txt

/* utf.js - UTF-8 <=> UTF-16 convertion
 *
 * Copyright (C) 1999 Masanao Izumo <[email protected]>
 * Version: 1.0
 * LastModified: Dec 25 1999
 * This library is free.  You can redistribute it and/or modify it.
 */

function Utf8ArrayToStr(array) {
  var out, i, len, c;
  var char2, char3;

  out = "";
  len = array.length;
  i = 0;
  while (i < len) {
    c = array[i++];
    switch (c >> 4)
    { 
      case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
        // 0xxxxxxx
        out += String.fromCharCode(c);
        break;
      case 12: case 13:
        // 110x xxxx   10xx xxxx
        char2 = array[i++];
        out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F));
        break;
      case 14:
        // 1110 xxxx  10xx xxxx  10xx xxxx
        char2 = array[i++];
        char3 = array[i++];
        out += String.fromCharCode(((c & 0x0F) << 12) |
                                   ((char2 & 0x3F) << 6) |
                                   ((char3 & 0x3F) << 0));
        break;
    }
  }    
  return out;
}

It's somewhat cleaner as the other solutions because it doesn't use any hacks nor depends on Browser JS functions, e.g. works also in other JS environments.

它比其他解决方案更简洁，因为它不使用任何 hacks 也不依赖于浏览器 JS 功能，例如也适用于其他 JS 环境。

Check out the JSFiddle demo.

查看JSFiddle 演示。

Also see the related questions: here, here

另请参阅相关问题：here, here

Answer 4

回答by popham

There's a polyfill for Encodingover on Github: text-encoding. It's easy for Node or the browser, and the Readme advises the following:

Github 上有一个用于编码的polyfill：text-encoding。对于 Node 或浏览器来说很容易，自述文件建议如下：

var uint8array = TextEncoder(encoding).encode(string);
var string = TextDecoder(encoding).decode(uint8array);

If I recall, 'utf-8'is the encodingyou need, and of course you'll need to wrap your buffer:

如果我记得，'utf-8'是encoding你需要的，当然你需要包装你的缓冲区：

var uint8array = new Uint8Array(utf8buffer);

Hope it works as well for you as it has for me.

希望它对你和我一样有效。

Answer 5

回答by Esailija

If you are doing this in browser there are no character encoding libraries built-in, but you can get by with:

如果您在浏览器中执行此操作，则没有内置字符编码库，但您可以通过：

function pad(n) {
    return n.length < 2 ? "0" + n : n;
}

var array = new Uint8Array(data);
var str = "";
for( var i = 0, len = array.length; i < len; ++i ) {
    str += ( "%" + pad(array[i].toString(16)))
}

str = decodeURIComponent(str);

Here's a demo that decodes a 3-byte UTF-8 unit: http://jsfiddle.net/Z9pQE/

这是一个解码 3 字节 UTF-8 单元的演示：http: //jsfiddle.net/Z9pQE/

Answer 6

回答by Martin Wantke

The methods readAsArrayBufferand readAsTextfrom a FileReaderobject converts a Blob object to an ArrayBuffer or to a DOMString asynchronous.

方法readAsArrayBuffer和readAsText从一个的FileReader对象Blob对象转换为ArrayBuffer或一个DOMString异步的。

A Blob object type can be created from a raw text or byte array, for example.

例如，可以从原始文本或字节数组创建 Blob 对象类型。

let blob = new Blob([text], { type: "text/plain" });

let reader = new FileReader();
reader.onload = event =>
{
    let buffer = event.target.result;
};
reader.readAsArrayBuffer(blob);

I think it's better to pack up this in a promise:

我认为最好将其打包成一个承诺：

function textToByteArray(text)
{
    let blob = new Blob([text], { type: "text/plain" });
    let reader = new FileReader();
    let done = function() { };

    reader.onload = event =>
    {
        done(new Uint8Array(event.target.result));
    };
    reader.readAsArrayBuffer(blob);

    return { done: function(callback) { done = callback; } }
}

function byteArrayToText(bytes, encoding)
{
    let blob = new Blob([bytes], { type: "application/octet-stream" });
    let reader = new FileReader();
    let done = function() { };

    reader.onload = event =>
    {
        done(event.target.result);
    };

    if(encoding) { reader.readAsText(blob, encoding); } else { reader.readAsText(blob); }

    return { done: function(callback) { done = callback; } }
}

let text = "\uD83D\uDCA9 = \u2661";
textToByteArray(text).done(bytes =>
{
    console.log(bytes);
    byteArrayToText(bytes, 'UTF-8').done(text => 
    {
        console.log(text); //  = ?
    });
});

Answer 7

回答by Rosberg Linhares

If you don't want to use any external polyfill library, you can use this function provided by the Mozilla Developer Network website:

如果你不想使用任何外部 polyfill 库，你可以使用Mozilla 开发者网络网站提供的这个功能：

function utf8ArrayToString(aBytes) {
    var sView = "";
    
    for (var nPart, nLen = aBytes.length, nIdx = 0; nIdx < nLen; nIdx++) {
        nPart = aBytes[nIdx];
        
        sView += String.fromCharCode(
            nPart > 251 && nPart < 254 && nIdx + 5 < nLen ? /* six bytes */
                /* (nPart - 252 << 30) may be not so safe in ECMAScript! So...: */
                (nPart - 252) * 1073741824 + (aBytes[++nIdx] - 128 << 24) + (aBytes[++nIdx] - 128 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
            : nPart > 247 && nPart < 252 && nIdx + 4 < nLen ? /* five bytes */
                (nPart - 248 << 24) + (aBytes[++nIdx] - 128 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
            : nPart > 239 && nPart < 248 && nIdx + 3 < nLen ? /* four bytes */
                (nPart - 240 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
            : nPart > 223 && nPart < 240 && nIdx + 2 < nLen ? /* three bytes */
                (nPart - 224 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
            : nPart > 191 && nPart < 224 && nIdx + 1 < nLen ? /* two bytes */
                (nPart - 192 << 6) + aBytes[++nIdx] - 128
            : /* nPart < 127 ? */ /* one byte */
                nPart
        );
    }
    
    return sView;
}

let str = utf8ArrayToString([50,72,226,130,130,32,43,32,79,226,130,130,32,226,135,140,32,50,72,226,130,130,79]);

// Must show 2H? + O? ? 2H?O
console.log(str);

Answer 8

回答by Tchakabam

The latest answers to these type of questions (using nowadays methods) is here: Converting between strings and ArrayBuffers

这些类型问题的最新答案（使用当今的方法）在这里：在字符串和 ArrayBuffers 之间转换

Answer 9

回答by konak

The main problem of programmers looking for conversion from byte array into a string is UTF-8 encoding (compression) of unicode characters. This code will help you:

寻求从字节数组转换为字符串的程序员的主要问题是 Unicode 字符的 UTF-8 编码（压缩）。此代码将帮助您：

var getString = function (strBytes) {

    var MAX_SIZE = 0x4000;
    var codeUnits = [];
    var highSurrogate;
    var lowSurrogate;
    var index = -1;

    var result = '';

    while (++index < strBytes.length) {
        var codePoint = Number(strBytes[index]);

        if (codePoint === (codePoint & 0x7F)) {

        } else if (0xF0 === (codePoint & 0xF0)) {
            codePoint ^= 0xF0;
            codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
            codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
            codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
        } else if (0xE0 === (codePoint & 0xE0)) {
            codePoint ^= 0xE0;
            codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
            codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
        } else if (0xC0 === (codePoint & 0xC0)) {
            codePoint ^= 0xC0;
            codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
        }

        if (!isFinite(codePoint) || codePoint < 0 || codePoint > 0x10FFFF || Math.floor(codePoint) != codePoint)
            throw RangeError('Invalid code point: ' + codePoint);

        if (codePoint <= 0xFFFF)
            codeUnits.push(codePoint);
        else {
            codePoint -= 0x10000;
            highSurrogate = (codePoint >> 10) | 0xD800;
            lowSurrogate = (codePoint % 0x400) | 0xDC00;
            codeUnits.push(highSurrogate, lowSurrogate);
        }
        if (index + 1 == strBytes.length || codeUnits.length > MAX_SIZE) {
            result += String.fromCharCode.apply(null, codeUnits);
            codeUnits.length = 0;
        }
    }

    return result;
}

All the best !

祝一切顺利！

UTF-8 ArrayBuffer 和 String 之间的转换

提问by Tom Leese

采纳答案by Niccolò Campolungo

回答by PPB

回答by Albert

回答by popham

回答by Esailija

回答by Martin Wantke

回答by Rosberg Linhares

回答by Tchakabam

回答by konak

相关推荐

最近更新

标签

UTF-8 ArrayBuffer 和 String 之间的转换

提问by Tom Leese

采纳答案by Niccolò Campolungo

回答by PPB

回答by Albert

回答by popham

回答by Esailija

回答by Martin Wantke

回答by Rosberg Linhares

回答by Tchakabam

回答by konak

相关推荐

string 创建具有字符串名称的变量

string 如何在 Javascript 中生成随机的字母和数字字符串？

string 如何通过在 Windows 中使用批处理替换子字符串来重命名文件

string 将单元格内容与 Excel 中的字符串进行比较

相关推荐

最近更新

标签