java string.getBytes("UTF-8") javascript 等价物

Question

提问by Wesley

I have this string in java:

我在java中有这个字符串：

"test.message"

byte[] bytes = plaintext.getBytes("UTF-8");
//result: [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]

If I do the same thing in javascript:

如果我在 javascript 中做同样的事情：

    stringToByteArray: function (str) {         
        str = unescape(encodeURIComponent(str));

        var bytes = new Array(str.length);
        for (var i = 0; i < str.length; ++i)
            bytes[i] = str.charCodeAt(i);

        return bytes;
    },

I get:

我得到：

 [7,163,140,72,178,72,244,241,149,43,67,124]

I was under the impression that the unescape(encodeURIComponent()) would correctly translate the string to UTF-8. Is this not the case?

我的印象是 unescape(encodeURIComponent()) 会正确地将字符串转换为 UTF-8。不是这样吗？

Reference:

参考：

http://ecmanaut.blogspot.be/2006/07/encoding-decoding-utf8-in-javascript.html

Answer 1

采纳答案by Paul S.

JavaScripthas no concept of character encoding for String, everything is in UTF-16. Most of time time the value of a charin UTF-16matches UTF-8, so you can forget it's any different.

JavaScript没有String字符编码的概念，一切都在UTF-16 中。大多数情况下char，UTF-16中a 的值与UTF-8匹配，因此您可以忘记它有什么不同。

There are more optimal ways to do this but

有更多最佳方法可以做到这一点，但是

function s(x) {return x.charCodeAt(0);}
"test.message".split('').map(s);
// [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]

So what is unescape(encodeURIComponent(str))doing? Let's look at each individually,

那么在unescape(encodeURIComponent(str))做什么呢？让我们分别看一下，

encodeURIComponentis converting every character in strwhich is illegal or has a meaning in URI Syntaxinto a URI escapedversion so that there is no problem using it as a key or value in the search component of a URI, for example encodeURIComponent('&='); // "%26%3D"Notice how this is now a 6 character long String.
unescapeis actually depreciated, but it does a similar job to decodeURIor decodeURIComponent(the reverse of encodeURIComponent). If we look in the ES5 specwe can see 11. Let c be the character whose code unit value is the integer represented by the four hexadecimal digits at positions k+2, k+3, k+4, and k+5 within Result(1).
So, 4digits is 2bytes is "UTF-8", however as I mentioned, all Stringsare UTF-16, so it's really a UTF-16string limiting itself to UTF-8.

encodeURIComponent是每个字符转换中str这是非法的或者具有意义URI语法为URI转义版本，因此不存在使用它作为一个的搜索组件的键或值没有问题的URI，例如encodeURIComponent('&='); // "%26%3D"注意如何，这是现在6字符长字符串。
unescape实际上已折旧，但它的作用与decodeURI或decodeURIComponent（与相反encodeURIComponent）。如果我们查看ES5 规范，我们可以看到，11. Let c be the character whose code unit value is the integer represented by the four hexadecimal digits at positions k+2, k+3, k+4, and k+5 within Result(1).
因此，4digits is 2bytes 是"UTF-8"，但是正如我所提到的，所有字符串都是UTF-16，因此它实际上是一个UTF-16字符串，将自身限制为UTF-8。

Answer 2

回答by Kevin Hakanson

You can use TextEncoderwhich is part of the Encoding Living Standard. According to the Encoding APIentry from the Chromium Dashboard, it shipped in Firefox and will ship in Chrome 38. There is also a text-encodingpolyfill available.

您可以使用TextEncoder哪个是编码生活标准的一部分。根据Chromium Dashboard的Encoding API条目，它在 Firefox 中提供，并将在 Chrome 38 中提供。还有一个文本编码polyfill 可用。

The JavaScript code sample below returns a Uint8Arrayfilled with the values you expect.

下面的 JavaScript 代码示例返回一个Uint8Array填充了您期望的值。

var s = "test.message";
var encoder = new TextEncoder();
encoder.encode(s);
// [116, 101, 115, 116, 46, 109, 101, 115, 115, 97, 103, 101]

java string.getBytes("UTF-8") javascript 等价物

提问by Wesley

采纳答案by Paul S.

回答by Kevin Hakanson

相关推荐

最近更新

标签

java string.getBytes("UTF-8") javascript 等价物

提问by Wesley

采纳答案by Paul S.

回答by Kevin Hakanson

相关推荐

MOXy JAXB javax.xml.bind.PropertyException

Java 如何从 Graphics g 获取像素颜色

Java Inflater 对有效输入抛出“不正确的标头检查”异常

Java com.google.gson.JsonSyntaxException 尝试解析 json 中的日期/时间时

相关推荐

最近更新

标签