UTF-8 CJK 字符不显示在 Java 中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5965195/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UTF-8 CJK characters not displaying in Java
提问by Twicetimes
I've been reading up on Unicode and UTF-8 encoding for a while and I think I understand it, so hopefully this won't be a stupid question:
我已经阅读了一段时间的 Unicode 和 UTF-8 编码,我想我明白了,所以希望这不会是一个愚蠢的问题:
I have a file which contains some CJK characters, and which has been saved as UTF-8. I have various Asian language packs installed and the characters are rendered properly by other applications, so I know that much works.
我有一个文件,其中包含一些 CJK 字符,并且已保存为 UTF-8。我安装了各种亚洲语言包,并且其他应用程序可以正确呈现字符,所以我知道很多工作。
In my Java app, I read the file as follows:
在我的 Java 应用程序中,我按如下方式读取文件:
// Create objects
fis = new FileInputStream(new File("xyz.sgf"));
InputStreamReader is = new InputStreamReader(fis, Charset.forName("UTF-8"));
BufferedReader br = new BufferedReader(is);
// Read and display file contents
StringBuffer sb = new StringBuffer();
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
System.out.println(sb);
The output shows the CJK characters as '???'. A call to is.getEncoding()
confirms that it is definitely using UTF-8. What step am I missing to make the characters appear properly? If it makes a difference, I'm looking at the output using the Eclipse console.
输出将 CJK 字符显示为“???”。调用is.getEncoding()
确认它肯定使用 UTF-8。我缺少什么步骤才能使字符正确显示?如果它有所作为,我正在使用 Eclipse 控制台查看输出。
回答by McDowell
System.out.println(sb);
The problem is the above line. This will encode character data using the default system encoding and emit the data to STDOUT. On many systems, this is a lossy process.
问题是上面那一行。这将使用默认系统编码对字符数据进行编码,并将数据发送到 STDOUT。在许多系统上,这是一个有损过程。
If you change the defaults, the encoding used by System.out
and the encoding used by the console must match.
如果更改默认值,System.out
则控制台使用的编码和控制台使用的编码必须匹配。
The only supported mechanism to change the default system encoding is via the operating system. (Some will advise using the file.encoding
system property, but this is not supportedand may have unintended side-effects.)You can use setOutto your own custom PrintStream
:
更改默认系统编码的唯一支持机制是通过操作系统。(有些人会建议使用file.encoding
系统属性,但这不受支持,并且可能会产生意想不到的副作用。)您可以将setOut用于您自己的自定义PrintStream
:
PrintStream stdout = new PrintStream(System.out, autoFlush, encoding);
You can change the Eclipse console encoding via the Run configuration.
您可以通过运行配置更改 Eclipse 控制台编码。
You can find a number of posts about the subject on my blog - via my profile.
您可以在我的博客上找到许多关于该主题的帖子 - 通过我的个人资料。
回答by Ed Poor
The following program prints CJK characters to the console using TextPad. To see the Korean Hangul and Japanese Hiragana I had to tell Java to change the print stream's encoding to EUC_KR and set the properties of TextPad's tool output window:
以下程序使用 TextPad 将 CJK 字符打印到控制台。要查看韩文和日文平假名,我必须告诉 Java 将打印流的编码更改为 EUC_KR 并设置 TextPad 工具输出窗口的属性:
- font is Arial Unicode MS
- script is Hangul
- 字体是 Arial Unicode MS
- 脚本是韩文
import java.io.PrintStream;
import java.io.UnsupportedEncodingException;
class Hangul {
public static void main(String[] args) throws Exception {
// Change console encoding to Korean
PrintStream out = new PrintStream(System.out, true, "EUC_KR");
System.setOut(out);
// Print sample to console
String go_hello = "?? こんにちは";
System.out.println(go_hello);
}
}
Tool Output is:
工具输出为:
?? こんにちは
?? こんにちは
回答by asgs
Yeah, you need to change the encoding of the Eclipse console as explained in this how-to-display-chinese-character-in-eclipse-consolearticle
是的,您需要更改 Eclipse 控制台的编码,如这篇how-to-display-chinese-character-in-eclipse-console文章中所述
回答by Mark Rotteveel
Depending on your platform, it is highly likely that your console (or windows CMD) does not support or use the UTF-8 characterset, and therefor converts all unmappable characters to a question mark.
根据您的平台,您的控制台(或 windows CMD)很可能不支持或不使用 UTF-8 字符集,因此将所有不可映射的字符转换为问号。
On Windows for example CMD almost always uses WIN1252 or a similar single byte characterset.
例如,在 Windows 上,CMD 几乎总是使用 WIN1252 或类似的单字节字符集。