JSON 和转义字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4901133/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 17:44:02  来源:igfitidea点击:

JSON and escaping characters

jsonunicode

提问by Jason S

I have a string which gets serialized to JSON in Javascript, and then deserialized to Java.

我有一个字符串,它在 Javascript 中序列化为 JSON,然后反序列化为 Java。

It looks like if the string contains a degree symbol, then I get a problem.

看起来如果字符串包含度数符号,那么我就会遇到问题。

I could use some help in figuring out who to blame:

我可以使用一些帮助来找出应该归咎于谁:

  • is it the Spidermonkey 1.8 implementation? (this has a JSON implementation built-in)
  • is it Google gson?
  • is it me for not doing something properly?
  • 是 Spidermonkey 1.8 的实现吗?(这有一个内置的 JSON 实现)
  • 谷歌 gson吗?
  • 是我做的不对吗?

Here's what happens in JSDB:

下面是 JSDB 中发生的事情:

js>s='15\u00f8C'
15°C
js>JSON.stringify(s)
"15°C"

I would have expected "15\u00f8C'which leads me to believe that Spidermonkey's JSON implementation isn't doing the right thing... except that the JSON homepage's syntax description(is that the spec?) says that a char can be

我原"15\u00f8C'以为这会让我相信 Spidermonkey 的 JSON 实现没有做正确的事情……除了JSON 主页的语法描述(这是规范吗?)说一个字符可以是

any-Unicode-character- except-"-or-\-or- control-character"

any-Unicode-character-except-"-or-\-or- control-character"

so maybe it passes the string along as-is without encoding it as \u00f8... in which case I would think the problem is with the gson library.

所以也许它会按原样传递字符串而不将其编码为 \u00f8 ... 在这种情况下,我认为问题出在 gson 库上。

Can anyone help?

任何人都可以帮忙吗?

I suppose my workaround is to use either a different JSON library, or manually escape strings myself after calling JSON.stringify()-- but if this is a bug then I'd like to file a bug report.

我想我的解决方法是使用不同的 JSON 库,或者在调用后自己手动转义字符串JSON.stringify()——但如果这是一个错误,那么我想提交一个错误报告。

回答by McDowell

This is not a bug in either implementation. There is no requirement to escape U+00B0. To quote the RFC:

这不是任何一个实现中的错误。不需要转义 U+00B0。引用RFC

2.5. Strings

The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

Any character maybe escaped.

2.5. 字符串

字符串的表示类似于 C 系列编程语言中使用的约定。字符串以引号开始和结束。除了必须转义的字符外,所有 Unicode 字符都可以放在引号内:引号、反斜杠和控制字符(U+0000 到 U+001F)。

任何字符可以转义。

Escaping everything inflates the size of the data (all code points can be represented in four or fewer bytes in all Unicode transformation formats; whereas encoding them all makes them six or twelve bytes).

转义所有内容会增加数据的大小(在所有 Unicode 转换格式中,所有代码点都可以用四个或更少的字节表示;而对它们全部进行编码会使它们变成六个或十二个字节)。

It is more likely that you have a text transcoding bug somewhere in your code and escaping everything in the ASCII subset masks the problem. It is a requirement of the JSON spec that all data use a Unicode encoding.

更有可能的是,您的代码中某处存在文本转码错误,并且转义 ASCII 子集中的所有内容会掩盖问题。JSON 规范要求所有数据都使用 Unicode 编码。

回答by Jason S

hmm, well here's a workaround anyway:

嗯,无论如何,这里有一个解决方法:

function JSON_stringify(s, emit_unicode)
{
   var json = JSON.stringify(s);
   return emit_unicode ? json : json.replace(/[\u007f-\uffff]/g,
      function(c) { 
        return '\u'+('0000'+c.charCodeAt(0).toString(16)).slice(-4);
      }
   );
}

test case:

测试用例:

js>s='15\u00f8C 3\u0111';
15°C 3?
js>JSON_stringify(s, true)
"15°C 3?"
js>JSON_stringify(s, false)
"15\u00f8C 3\u0111"