Unicode 字符在 Java JSON 解析中显示为问号
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11868022/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Unicode Characters appearing as Question Marks in Java JSON Parsing
提问by Sri Gandhi
I have been searching about this for the past few days but I don't think I am able to find a correct pointer. Please merge it with the appropriate question if found as duplicate.
过去几天我一直在寻找这个问题,但我认为我无法找到正确的指针。如果发现重复,请将其与适当的问题合并。
I am pretty new to working with JSON and as part of one of my projects I need to decode a JSON file and do further processing on it. However when I tried decoding using the Json-simple library, I get some weird question marks in the parsed object instead of the actual characters. A sample code is shown below:
我对使用 JSON 还很陌生,作为我的一个项目的一部分,我需要解码一个 JSON 文件并对其进行进一步处理。然而,当我尝试使用 Json-simple 库进行解码时,我在解析的对象中得到了一些奇怪的问号,而不是实际的字符。示例代码如下所示:
String str = "{\"alias\": [\"Evr\u00f3pa\", \"\u05d0\u05d9\u05e8\u05d5\u05e4\"]}";
JSONParser parser = new JSONParser();
JSONObject jsonObject = (JSONObject)parser.parse(str);
System.out.println(jsonObject) gives {"alias":["Evrópa","?????"]}
I tried using Json-lib too with the same result.
我也尝试使用 Json-lib 并得到相同的结果。
Thanks for the help.
谢谢您的帮助。
回答by dsh
The problem isn't with your JSON, it's with your System.out.println(). Those characters can't be represented in the character encoding either of your terminal (or your IDE, if that is where you ran it) or of the encoding being used by System.out in your environment.
问题不在于您的 JSON,而在于您的 System.out.println()。这些字符不能用您的终端(或您的 IDE,如果是您运行它的地方)的字符编码或 System.out 在您的环境中使用的编码来表示。
Files can not contain Unicode characters. Files are streams of bytes, but Unicode charactersare multiple bytes (usually two) in size. This is where character encodings become relevant. Unicode characters must be converted to a sequence of bytes to write them to a file (including System.out). One of the most commonly used encodings for Unicode characters is UTF-8. The trick for software programmers is to always use the correct character encoding when converting between bytes and characters. Lacking the correct encoding in a single place, for example in a debug println() call, will give erroneous and misleading output.
文件不能包含 Unicode 字符。文件是字节流,但 Unicode字符的大小是多个字节(通常是两个)。这是字符编码变得相关的地方。Unicode 字符必须转换为字节序列才能将它们写入文件(包括 System.out)。Unicode 字符最常用的编码之一是 UTF-8。软件程序员的诀窍是在字节和字符之间进行转换时始终使用正确的字符编码。在一个地方缺少正确的编码,例如在调试 println() 调用中,会产生错误和误导性的输出。
回答by kpentchev
You are probably using a default character set that doesn't support the group of special characters. Try using UTF-8 as your charset, something along these lines:
您可能正在使用不支持特殊字符组的默认字符集。尝试使用 UTF-8 作为您的字符集,大致如下:
String str = "{\"alias\": [\"Evr\u00f3pa\", \"\u05d0\u05d9\u05e8\u05d5\u05e4\"]}";
InputStreamReader isr = new InputStreamReader(new ByteArrayInputStream(str.getBytes(Charset.forName("UTF-8"))), Charset.forName("UTF-8"));
JSONParser parser = new JSONParser();
JSONObject jsonObject = (JSONObject)parser.parse(isr);