Javascript JSON.stringify 不应该转义 Unicode 字符吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12271547/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 07:34:34  来源:igfitidea点击:

Shouldn't JSON.stringify escape Unicode characters?

javascriptjsonunicode

提问by Ates Goral

I have a simple test page in UTF-8 where text with letters in multiple different languages gets stringified to JSON:

我有一个简单的 UTF-8 测试页面,其中包含多种不同语言字母的文本被字符串化为 JSON:

http://jsfiddle.net/Mhgy5/

http://jsfiddle.net/Mhgy5/

HTML:

HTML:

<textarea id="txt">
検索 ? Busca ? S?k ? 搜尋 ? Tìm ki?m ? Пошук ? Cerca ? S?k ? Haku ? Hledání ? Keresés ? ?? ? Cari ? Ara ? ????? ? C?utare ? ??? ? H?ada? ? S?g ? Ser?u ? Претрага ? Paie?ka ? Poi??i ? Cari ? ????? ? Търсене ? ?здеу ? Bilatu ? Suk ? Bilnga ? Tra?i ? ?????
</textarea>
<button id="encode">Encode</button>
<pre id="out">
</pre>

JavaScript:

JavaScript:

?$("#encode").click(function () {
    $("#out").text(JSON.stringify({ txt: $("#txt").val() }));
}).click();
?

While I expect the non-ASCII characters to be escaped as \uXXXX as per the JSON spec, they seem to be untouched. Here's the output I get from the above test:

虽然我希望按照JSON 规范将非 ASCII 字符转义为 \uXXXX ,但它们似乎未受影响。这是我从上述测试中得到的输出:

{"txt":"検索 ? Busca ? S?k ? 搜尋 ? Tìm ki?m ? Пошук ? Cerca ? S?k ? Haku ? Hledání ? Keresés ? ?? ? Cari ? Ara ? ????? ? C?utare ? ??? ? H?ada? ? S?g ? Ser?u ? Претрага ? Paie?ka ? Poi??i ? Cari ? ????? ? Търсене ? ?здеу ? Bilatu ? Suk ? Bilnga ? Tra?i ? ?????\n"}

I'm using Chrome, so it should be the native JSON.stringifyimplementation. The page's encoding is UTF-8. Shouldn't the non-ASCII characters be escaped?

我正在使用 Chrome,所以它应该是本机JSON.stringify实现。页面的编码是 UTF-8。非 ASCII 字符不应该被转义吗?

What brought me to this test in the first place is, I noticed that jQuery.ajaxdoesn't seem to escape non-ASCII characters when they appear in a data object property. The characters seem to be transmitted as UTF-8.

首先让我进行这个测试的是,我注意到jQuery.ajax当非 ASCII 字符出现在数据对象属性中时,它们似乎没有转义。字符似乎是作为 UTF-8 传输的。

回答by Rob W

The JSON specdoes not demand the conversion from unicode characters to escape-sequences. "Any UNICODE character except " or \ or control character." is defined to be a valid JSON-serialized string:

JSON规范不从Unicode字符需要转换为转义序列。“除“或\或控制字符之外的任何UNICODE字符。”被定义为有效的JSON序列化字符串:

json string format

json字符串格式

回答by Csongor Halmai

The short answer for your question is NO; JSON.stringifyshouldn't escape your string.

您的问题的简短回答是否定的;JSON.stringify不应该逃避你的字符串。

Although, handling utf8strings can seem strange if you save your HTML file with utf-8encoding but don't declare it to be an utf8file.

尽管如此,如果您使用编码保存 HTML 文件但未将其声明为utf8文件,则处理utf8字符串似乎很奇怪。utf-8

For example:

例如:

<!doctype html>
<html>
    <head>
        <title></title>
        <script>
            var data="árvízt?r? tük?rfúrógép áRVíZT?R? TüK?RFúRóGéP";
            alert(JSON.stringify(data));
        </script>
    </head>
</html>

This would alert "??rv?-zt?±r?‘ t??k??rf?or?3g??p ?RV?ZT?°R? T??K?–RF??R?“G?‰P".

这会引起警觉"??rv?-zt?±r?‘ t??k??rf?or?3g??p ?RV?ZT?°R? T??K?–RF??R?“G?‰P"

But if you add the following line to the header:

但是,如果您将以下行添加到标题中:

<meta charset="UTF-8">

Then, the alert will be what one could expect: "árvízt?r? tük?rfúrógép áRVíZT?R? TüK?RFúRóGéP".

然后,警报将是人们所期望的:"árvízt?r? tük?rfúrógép áRVíZT?R? TüK?RFúRóGéP"

回答by GolezTrol

No. The preferred encoding for JSON is UTF-8, so those characters do not need to be escaped.

否。JSON 的首选编码是 UTF-8,因此不需要对这些字符进行转义。

You are allowed to escape unicode characters if you want to be safer or explicitly send the JSON in a different encoding (that is, pure ASCII), but it is against recommendations.

如果您想要更安全或以不同的编码(即纯 ASCII)显式发送 JSON,您可以转义 unicode 字符,但这不符合建议。

回答by Kerrek SB

Your claim is just not true. JSON strings consist of unicode codepoints (except '"' and '\'), that's all. The entire JSON document can be encoded in UTF-8, UTF-16 or UTF-32, at the discretion of the producer. Additionally, strings can contain escape sequences which provide an alternativeform of naming code points, alternative to including them literally.

你的说法是不正确的。JSON 字符串由 unicode 代码点组成('"' 和 '\' 除外),仅此而已。根据生产者的判断,整个 JSON 文档可以编码为 UTF-8、UTF-16 或 UTF-32。此外,字符串可以包含转义序列,这些转义序列提供了一种命名代码点的形式,而不是按字面意思包含它们。

If the distinction between the two still eludes you, here's an example of two different ways of writing the same string in JSON:

如果您仍然无法区分两者之间的区别,以下是在 JSON 中编写相同字符串的两种不同方式的示例:

  • "A"

  • "\u0041"

  • "A"

  • "\u0041"

Both versions represent the same string, which consists of the single codepoint U+41, which is A.

两个版本都表示相同的字符串,它由单个代码点 U+41 组成,即A.