JSON 字符编码 - 浏览器是否支持 UTF-8,还是应该使用数字转义序列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/583562/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 17:11:48  来源:igfitidea点击:

JSON character encoding - is UTF-8 well-supported by browsers or should I use numeric escape sequences?

web-servicesjsonunicodeutf-8

提问by schickb

I am writing a webservice that uses json to represent its resources, and I am a bit stuck thinking about the best way to encode the json. Reading the json rfc (http://www.ietf.org/rfc/rfc4627.txt) it is clear that the preferred encoding is utf-8. But the rfc also describes a string escaping mechanism for specifying characters. I assume this would generally be used to escape non-ascii characters, thereby making the resulting utf-8 valid ascii.

我正在编写一个使用 json 来表示其资源的 Web 服务,并且在考虑对 json 进行编码的最佳方法时有点卡住了。阅读 json rfc ( http://www.ietf.org/rfc/rfc4627.txt) 很明显首选编码是 utf-8。但是 rfc 还描述了用于指定字符的字符串转义机制。我认为这通常用于转义非 ascii 字符,从而使生成的 utf-8 有效 ascii。

So let's say I have a json string that contains unicode characters (code-points) that are non-ascii. Should my webservice just utf-8 encoding that and return it, or should it escape all those non-ascii characters and return pure ascii?

因此,假设我有一个 json 字符串,其中包含非 ascii 的 unicode 字符(代码点)。我的网络服务应该只是 utf-8 编码并返回它,还是应该转义所有那些非 ascii 字符并返回纯 ascii?

I'd like browsers to be able to execute the results using jsonp or eval. Does that effect the decision? My knowledge of various browser's javascript support for utf-8 is lacking.

我希望浏览器能够使用 jsonp 或 eval 执行结果。这会影响决定吗?我对各种浏览器对 utf-8 的 javascript 支持缺乏了解。

EDIT: I wanted to clarify that my main concern about how to encode the results is really about browser handling of the results. What I've read indicates that browsers may be sensitive to the encoding when using JSONP in particular. I haven't found any really good info on the subject, so I'll have to start doing some testing to see what happens. Ideally I'd like to only escape those few characters that are required and just utf-8 encode the results.

编辑:我想澄清一下,我对如何编码结果的主要关注实际上是关于浏览器对结果的处理。我所读到的内容表明,特别是在使用 JSONP 时,浏览器可能对编码敏感。我还没有找到关于这个主题的任何真正好的信息,所以我必须开始做一些测试,看看会发生什么。理想情况下,我只想转义那些需要的几个字符,并且只对结果进行 utf-8 编码。

采纳答案by thomasrutter

The JSON spec requiresUTF-8 support by decoders. As a result, all JSON decoders can handle UTF-8 just as well as they can handle the numeric escape sequences. This is also the case for Javascript interpreters, which means JSONP will handle the UTF-8 encoded JSON as well.

JSON 规范要求解码器支持 UTF-8。因此,所有 JSON 解码器都可以像处理数字转义序列一样处理 UTF-8。Javascript 解释器也是如此,这意味着 JSONP 也将处理 UTF-8 编码的 JSON。

The ability for JSON encoders to use the numeric escape sequences instead just offers you more choice. One reason you may choose the numeric escape sequences would be if a transport mechanism in betweenyour encoder and the intended decoder is not binary-safe.

JSON 编码器使用数字转义序列的能力只是为您提供了更多选择。您可能选择数字转义序列的一个原因是编码器和预期解码器之间的传输机制不是二进制安全的。

Another reason you may want to use numeric escape sequences is to prevent certain characters appearing in the stream, such as <, &and ", which may be interpreted as HTML sequences if the JSON code is placed without escaping into HTML or a browser wrongly interprets it as HTML. This can be a defence against HTML injection or cross-site scripting (note: some characters MUST be escaped in JSON, including "and \).

您可能想要使用数字转义序列的另一个原因是为了防止某些字符出现在流中,例如<,&",如果 JSON 代码放置时没有转义为 HTML 或浏览器错误地将其解释为 HTML ,则这些字符可能会被解释为 HTML 序列. 这可以防御 HTML 注入或跨站点脚本(注意:必须在 JSON 中对某些字符进行转义,包括"\)。

Some frameworks, including PHP's implementation of JSON, alwaysdo the numeric escape sequences on the encoder side for any character outside of ASCII. This is intended for maximum compatibility with limited transport mechanisms and the like. However, this should not be interpreted as an indication that JSON decoders have a problem with UTF-8.

一些框架,包括 PHP 的 JSON 实现,总是在编码器端为 ASCII 之外的任何字符执行数字转义序列。这是为了与有限的传输机制等实现最大的兼容性。但是,这不应被解释为 JSON 解码器在 UTF-8 方面存在问题的迹象。

So, I guess you just could decide which to use like this:

所以,我想你可以像这样决定使用哪个:

  • Just use UTF-8, unless your method of storage or transport between encoder and decoder is not binary-safe.

  • Otherwise, use the numeric escape sequences.

  • 只需使用 UTF-8,除非您在编码器和解码器之间的存储或传输方法不是二进制安全的。

  • 否则,使用数字转义序列。

回答by thomasrutter

I had a problem there. When I JSON encode a string with a character like "é", every browsers will return the same "é", except IE which will return "\u00e9".

我在那里遇到了问题。当我用“é”这样的字符对字符串进行 JSON 编码时,每个浏览器都将返回相同的“é”,除了 IE 将返回“\u00e9”。

Then with PHP json_decode(), it will fail if it find "é", so for Firefox, Opera, Safari and Chrome, I've to call utf8_encode() before json_decode().

然后使用 PHP json_decode(),如果找到“é”就会失败,所以对于 Firefox、Opera、Safari 和 Chrome,我必须在 json_decode() 之前调用 utf8_encode()。

Note : with my tests, IE and Firefox are using their native JSON object, others browsers are using json2.js.

注意:在我的测试中,IE 和 Firefox 使用它们的原生 JSON 对象,其他浏览器使用 json2.js。

回答by chaos

ASCII isn't in it any more. Using UTF-8 encoding means that you aren't using ASCII encoding. What you should use the escaping mechanism for is what the RFC says:

ASCII 不再在其中。使用 UTF-8 编码意味着您没有使用 ASCII 编码。您应该使用转义机制的原因是 RFC 所说的:

All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F)

所有 Unicode 字符都可以放在引号内,但必须转义的字符除外:引号、反斜杠和控制字符(U+0000 到 U+001F)

回答by Ankit Sewadik

I was facing the same problem. It works for me. Please check this.

我面临同样的问题。这个对我有用。请检查这个。

json_encode($array,JSON_UNESCAPED_UNICODE);

回答by Remy Lebeau

Reading the json rfc (http://www.ietf.org/rfc/rfc4627.txt) it is clear that the preferred encoding is utf-8.

阅读 json rfc ( http://www.ietf.org/rfc/rfc4627.txt) 很明显首选编码是 utf-8。

FYI, RFC 4627 is no longer the official JSON spec. It was obsoleted in 2014 by RFC 7159, which was then obsoleted in 2017 by RFC 8259, which is the current spec.

仅供参考,RFC 4627 不再是官方的 JSON 规范。它在 2014 年被RFC 7159废弃,然后在 2017 年被当前规范RFC 8259废弃。

RFC 8259 states:

RFC 8259 规定:

8.1. Character Encoding

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].

Previous specifications of JSON have not required the use of UTF-8 when transmitting JSON text. However, the vast majority of JSON-based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability.

Implementations MUST NOT add a byte order mark (U+FEFF) to the beginning of a networked-transmitted JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.

8.1. 字符编码

在不属于封闭生态系统的系统之间交换的 JSON 文本必须使用 UTF-8 [RFC3629] 进行编码

以前的 JSON 规范在传输 JSON 文本时不要求使用 UTF-8。然而,绝大多数基于 JSON 的软件实现都选择使用 UTF-8 编码,因为它是唯一实现互操作性的编码。

实现不得在网络传输的 JSON 文本的开头添加字节顺序标记 (U+FEFF)。为了互操作性,解析 JSON 文本的实现可能会忽略字节顺序标记的存在,而不是将其视为错误。

回答by Paul Smith

I had a similar problem with é char... I think the comment "it's possible that the text you're feeding it isn't UTF-8" is probably close to the mark here. I have a feeling the default collation in my instance was something else until I realized and changed to utf8... problem is the data was already there, so not sure if it converted the data or not when i changed it, displays fine in mysql workbench. End result is that php will not json encode the data, just returns false. Doesn't matter what browser you use as its the server causing my issue, php will not parse the data to utf8 if this char is present. Like i say not sure if it is due to converting the schema to utf8 after data was present or just a php bug. In this case use json_encode(utf8_encode($string));

我对 é char 有类似的问题......我认为评论“你提供的文本可能不是 UTF-8”可能接近这里的标记。我有一种感觉,我的实例中的默认排序规则是别的东西,直到我意识到并更改为 utf8 ... 问题是数据已经存在,所以不确定它是否在我更改时转换了数据,在 mysql 中显示正常工作台。最终结果是 php 不会对数据进行 json 编码,只会返回 false。无论您使用哪种浏览器作为导致我的问题的服务器,如果存在此字符,php 都不会将数据解析为 utf8。就像我说的不确定是由于数据存在后将架构转换为 utf8 还是只是一个 php 错误。在这种情况下使用json_encode(utf8_encode($string));