使用 unicode 字符解析 Json
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6019327/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Json parsing with unicode characters
提问by André Al?ada Padez
i have a json file with unicode characters, and i'm having trouble to parse it. I've tried in Flash CS5, the JSON library, and i have tried it in http://json.parser.online.fr/and i always get "unexpected token - eval fails"
我有一个带有 unicode 字符的 json 文件,但我无法解析它。我已经尝试过 Flash CS5,JSON 库,并且我已经在http://json.parser.online.fr/ 中尝试过,但我总是得到“意外令牌 - eval 失败”
I'm sorry, there realy was a problem with the syntax, it came this way from the client.
对不起,语法确实有问题,它来自客户端。
Can someone please help me? Thanks
有人可以帮帮我吗?谢谢
回答by Mike Baranczak
JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.
JSON 文本应以 Unicode 编码。默认编码为 UTF-8。
So a correctly encoded Unicode character should not be a problem. Which leads me to believe that it's notcorrectly encoded (maybe it uses latin-1 instead of UTF-8). How did you create the file? In a text editor?
所以正确编码的 Unicode 字符应该不是问题。这让我相信它没有正确编码(也许它使用 latin-1 而不是 UTF-8)。你是如何创建文件的?在文本编辑器中?
回答by knb
There might be an obscure Unicode whitespace character hidden in your string.
您的字符串中可能隐藏了一个模糊的 Unicode 空白字符。
This URL contains more detail:
此 URL 包含更多详细信息:
回答by Flash
In asp.net you would think you would use System.Text.Encoding to convert a string like "Paul\u0027s" back to a string like "Paul's" but i tried for hours and found nothing that worked.
在 asp.net 中,你会认为你会使用 System.Text.Encoding 将像“Paul\u0027s”这样的字符串转换回像“Paul's”这样的字符串,但我尝试了几个小时,没有发现任何有用的东西。
The trouble is hardcoding a string as shown above already decodes the string as you will see if you put a break point on it so in the end i wrote a function that converts the Hex27 to Dec39 so that i ended up with HTML encodeing and then decoded that.
问题是硬编码如上所示的字符串已经对字符串进行了解码,因为您将看到是否在其上放置了一个断点,所以最后我编写了一个将 Hex27 转换为 Dec39 的函数,以便我最终进行 HTML 编码然后解码那。
string Padding = "000";
for (int f = 1; f <= 256; f++)
{
string Hex = "\u" + Padding.Substring(0, 4 - f.ToString().Length) + f;
string Dec = "&#" + Int32.Parse(f.ToString(), NumberStyles.HexNumber) + ";";
HTML = HTML.Replace(Hex, Dec);
}
HTML = System.Web.HttpUtility.HtmlDecode(HTML);
Ugly as sin, I know but without using the latest framework (Not on ISP's server) it was the best I could do and someone must know a better solution.
丑陋的罪过,我知道但不使用最新的框架(不在 ISP 的服务器上)这是我能做的最好的事情,必须有人知道更好的解决方案。
回答by Chamira Fernando
I had the same problem and I just change the file encoding type Mac-Roman/windows-1252 to UTF-8.. and it worked
我遇到了同样的问题,我只是将文件编码类型 Mac-Roman/windows-1252 更改为 UTF-8 ..
回答by Ash
I had the same problem with Twitter json files. I was parsing them in Python with json.loads(tweet) but it failed for half of the records.
我对 Twitter json 文件有同样的问题。我用 json.loads(tweet) 在 Python 中解析它们,但它失败了一半的记录。
I changed to Python3 and it works well now.
我改用了 Python3,现在运行良好。
回答by handle
If you seem to have trouble with the encoding of a JSON file (i.e. escaped codes such as \u00fcaren't displayed correctly regardless of your editor's encoding setting) generated by Python with json.dumps(): it encodes ASCII by default and escapes the unicode characters! See python json unicode - how do I eval using javascript(and python: json.dumps can't handle utf-8?and Why does json.dumps escape non-ascii characters with "\uxxxx").
如果您似乎对\u00fcPython 生成的 JSON 文件的编码有问题(即转义代码,例如无论您的编辑器的编码设置如何都无法正确显示):它默认编码 ASCII 并转义 unicode 字符!请参阅python json unicode - 如何使用 javascript 进行评估(和python: json.dumps can't handle utf-8?以及为什么 json.dumps 使用 "\uxxxx" 转义非 ascii 字符)。json.dumps()

