JSON 和转义字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4901133/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
JSON and escaping characters
提问by Jason S
I have a string which gets serialized to JSON in Javascript, and then deserialized to Java.
我有一个字符串,它在 Javascript 中序列化为 JSON,然后反序列化为 Java。
It looks like if the string contains a degree symbol, then I get a problem.
看起来如果字符串包含度数符号,那么我就会遇到问题。
I could use some help in figuring out who to blame:
我可以使用一些帮助来找出应该归咎于谁:
- is it the Spidermonkey 1.8 implementation? (this has a JSON implementation built-in)
- is it Google gson?
- is it me for not doing something properly?
- 是 Spidermonkey 1.8 的实现吗?(这有一个内置的 JSON 实现)
- 是谷歌 gson吗?
- 是我做的不对吗?
Here's what happens in JSDB:
下面是 JSDB 中发生的事情:
js>s='15\u00f8C'
15°C
js>JSON.stringify(s)
"15°C"
I would have expected "15\u00f8C'which leads me to believe that Spidermonkey's JSON implementation isn't doing the right thing... except that the JSON homepage's syntax description(is that the spec?) says that a char can be
我原"15\u00f8C'以为这会让我相信 Spidermonkey 的 JSON 实现没有做正确的事情……除了JSON 主页的语法描述(这是规范吗?)说一个字符可以是
any-Unicode-character- except-"-or-\-or- control-character"
any-Unicode-character-except-"-or-\-or- control-character"
so maybe it passes the string along as-is without encoding it as \u00f8... in which case I would think the problem is with the gson library.
所以也许它会按原样传递字符串而不将其编码为 \u00f8 ... 在这种情况下,我认为问题出在 gson 库上。
Can anyone help?
任何人都可以帮忙吗?
I suppose my workaround is to use either a different JSON library, or manually escape strings myself after calling JSON.stringify()-- but if this is a bug then I'd like to file a bug report.
我想我的解决方法是使用不同的 JSON 库,或者在调用后自己手动转义字符串JSON.stringify()——但如果这是一个错误,那么我想提交一个错误报告。
回答by McDowell
This is not a bug in either implementation. There is no requirement to escape U+00B0. To quote the RFC:
这不是任何一个实现中的错误。不需要转义 U+00B0。引用RFC:
2.5. Strings
The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).
Any character maybe escaped.
2.5. 字符串
字符串的表示类似于 C 系列编程语言中使用的约定。字符串以引号开始和结束。除了必须转义的字符外,所有 Unicode 字符都可以放在引号内:引号、反斜杠和控制字符(U+0000 到 U+001F)。
任何字符都可以转义。
Escaping everything inflates the size of the data (all code points can be represented in four or fewer bytes in all Unicode transformation formats; whereas encoding them all makes them six or twelve bytes).
转义所有内容会增加数据的大小(在所有 Unicode 转换格式中,所有代码点都可以用四个或更少的字节表示;而对它们全部进行编码会使它们变成六个或十二个字节)。
It is more likely that you have a text transcoding bug somewhere in your code and escaping everything in the ASCII subset masks the problem. It is a requirement of the JSON spec that all data use a Unicode encoding.
更有可能的是,您的代码中某处存在文本转码错误,并且转义 ASCII 子集中的所有内容会掩盖问题。JSON 规范要求所有数据都使用 Unicode 编码。
回答by Jason S
hmm, well here's a workaround anyway:
嗯,无论如何,这里有一个解决方法:
function JSON_stringify(s, emit_unicode)
{
var json = JSON.stringify(s);
return emit_unicode ? json : json.replace(/[\u007f-\uffff]/g,
function(c) {
return '\u'+('0000'+c.charCodeAt(0).toString(16)).slice(-4);
}
);
}
test case:
测试用例:
js>s='15\u00f8C 3\u0111';
15°C 3?
js>JSON_stringify(s, true)
"15°C 3?"
js>JSON_stringify(s, false)
"15\u00f8C 3\u0111"

