使用 unicode 字符解析 Json

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6019327/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 17:51:17  来源:igfitidea点击:

Json parsing with unicode characters

jsonunicode

提问by André Al?ada Padez

i have a json file with unicode characters, and i'm having trouble to parse it. I've tried in Flash CS5, the JSON library, and i have tried it in http://json.parser.online.fr/and i always get "unexpected token - eval fails"

我有一个带有 unicode 字符的 json 文件,但我无法解析它。我已经尝试过 Flash CS5,JSON 库,并且我已经在http://json.parser.online.fr/ 中尝试过,但我总是得到“意外令牌 - eval 失败”

I'm sorry, there realy was a problem with the syntax, it came this way from the client.

对不起,语法确实有问题,它来自客户端。

Can someone please help me? Thanks

有人可以帮帮我吗?谢谢

回答by Mike Baranczak

Quoth the RFC:

引用 RFC:

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

JSON 文本应以 Unicode 编码。默认编码为 UTF-8。

So a correctly encoded Unicode character should not be a problem. Which leads me to believe that it's notcorrectly encoded (maybe it uses latin-1 instead of UTF-8). How did you create the file? In a text editor?

所以正确编码的 Unicode 字符应该不是问题。这让我相信它没有正确编码(也许它使用 latin-1 而不是 UTF-8)。你是如何创建文件的?在文本编辑器中?

回答by knb

There might be an obscure Unicode whitespace character hidden in your string.

您的字符串中可能隐藏了一个模糊的 Unicode 空白字符。

This URL contains more detail:

此 URL 包含更多详细信息:

http://timelessrepo.com/json-isnt-a-javascript-subset

http://timelessrepo.com/json-isnt-a-javascript-subset

回答by Flash

In asp.net you would think you would use System.Text.Encoding to convert a string like "Paul\u0027s" back to a string like "Paul's" but i tried for hours and found nothing that worked.

在 asp.net 中,你会认为你会使用 System.Text.Encoding 将像“Paul\u0027s”这样的字符串转换回像“Paul's”这样的字符串,但我尝试了几个小时,没有发现任何有用的东西。

The trouble is hardcoding a string as shown above already decodes the string as you will see if you put a break point on it so in the end i wrote a function that converts the Hex27 to Dec39 so that i ended up with HTML encodeing and then decoded that.

问题是硬编码如上所示的字符串已经对字符串进行了解码,因为您将看到是否在其上放置了一个断点,所以最后我编写了一个将 Hex27 转换为 Dec39 的函数,以便我最终进行 HTML 编码然后解码那。

 string Padding = "000";
                for (int f = 1; f <= 256; f++)
                {
                    string Hex = "\u" + Padding.Substring(0, 4 - f.ToString().Length) + f;
                    string Dec = "&#" + Int32.Parse(f.ToString(), NumberStyles.HexNumber) + ";";
                    HTML = HTML.Replace(Hex, Dec);
                }
                HTML = System.Web.HttpUtility.HtmlDecode(HTML);

Ugly as sin, I know but without using the latest framework (Not on ISP's server) it was the best I could do and someone must know a better solution.

丑陋的罪过,我知道但不使用最新的框架(不在 ISP 的服务器上)这是我能做的最好的事情,必须有人知道更好的解决方案。

回答by Chamira Fernando

I had the same problem and I just change the file encoding type Mac-Roman/windows-1252 to UTF-8.. and it worked

我遇到了同样的问题,我只是将文件编码类型 Mac-Roman/windows-1252 更改为 UTF-8 ..

回答by Ash

I had the same problem with Twitter json files. I was parsing them in Python with json.loads(tweet) but it failed for half of the records.

我对 Twitter json 文件有同样的问题。我用 json.loads(tweet) 在 Python 中解析它们,但它失败了一半的记录。

I changed to Python3 and it works well now.

我改用了 Python3,现在运行良好。

回答by handle

If you seem to have trouble with the encoding of a JSON file (i.e. escaped codes such as \u00fcaren't displayed correctly regardless of your editor's encoding setting) generated by Python with json.dumps(): it encodes ASCII by default and escapes the unicode characters! See python json unicode - how do I eval using javascript(and python: json.dumps can't handle utf-8?and Why does json.dumps escape non-ascii characters with "\uxxxx").

如果您似乎对\u00fcPython 生成的 JSON 文件的编码有问题(即转义代码,例如无论您的编辑器的编码设置如何都无法正确显示):它默认编码 ASCII 并转义 unicode 字符!请参阅python json unicode - 如何使用 javascript 进行评估(和python: json.dumps can't handle utf-8?以及为什么 json.dumps 使用 "\uxxxx" 转义非 ascii 字符)。json.dumps()