linux上的Java字符集问题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2168350/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Charset problem on linux
提问by Inv3r53
problem: I have a string containing special characters which i convert to bytes and vice versa..the conversion works properly on windows but on linux the special character is not converted properly.the default charset on linux is UTF-8 as seen with Charset.defaultCharset.getdisplayName()
问题:我有一个包含特殊字符的字符串,我将其转换为字节,反之亦然。转换在 Windows 上正常工作,但在 linux 上,特殊字符没有正确转换。Linux 上的默认字符集是 UTF-8,如 Charset 所示。 defaultCharset.getdisplayName()
however if i run on linux with option -Dfile.encoding=ISO-8859-1 it works properly..
但是,如果我使用选项 -Dfile.encoding=ISO-8859-1 在 linux 上运行,它可以正常工作..
how to make it work using the UTF-8 default charset and not setting the -D option in unix environment.
如何使用 UTF-8 默认字符集使其工作,而不是在 unix 环境中设置 -D 选项。
edit: i use jdk1.6.13
编辑:我使用 jdk1.6.13
edit:code snippet works with cs = "ISO-8859-1"; or cs="UTF-8"; on win but not in linux
编辑:代码片段适用于 cs = "ISO-8859-1"; 或 cs="UTF-8"; 在 win 上,但不在 linux 中
String x = "?";
System.out.println(x);
byte[] ba = x.getBytes(Charset.forName(cs));
for (byte b : ba) {
System.out.println(b);
}
String y = new String(ba, Charset.forName(cs));
System.out.println(y);
~regards daed
~问候daed
采纳答案by McDowell
Your characters are probably being corrupted by the compilation process and you're ending up with junk data in your class file.
您的字符可能已被编译过程损坏,并且您的类文件中最终会出现垃圾数据。
if i run on linux with option -Dfile.encoding=ISO-8859-1 it works properly..
如果我使用选项 -Dfile.encoding=ISO-8859-1 在 linux 上运行,它可以正常工作..
In short, don't use -Dfile.encoding=...
简而言之,不要使用 -Dfile.encoding=...
String x = "?";
Since U+00bd (½) will be represented by different values in different encodings:
由于 U+00bd (½) 将在不同的编码中由不同的值表示:
windows-1252 BD
UTF-8 C2 BD
ISO-8859-1 BD
...you need to tell your compiler what encoding your source file is encoded as:
...你需要告诉你的编译器你的源文件被编码为什么编码:
javac -encoding ISO-8859-1 Foo.java
Now we get to this one:
现在我们来看看这个:
System.out.println(x);
As a PrintStream, this will encode data to the system encoding prior to emitting the byte data. Like this:
作为PrintStream,这将在发送字节数据之前将数据编码为系统编码。像这样:
System.out.write(x.getBytes(Charset.defaultCharset()));
That may or may not work as you expect on some platforms- the byte encoding must match the encoding the console is expecting for the characters to show up correctly.
这在某些平台上可能会也可能不会像您期望的那样工作- 字节编码必须与控制台期望字符正确显示的编码相匹配。
回答by tangens
You should make the conversion explicitly:
您应该明确进行转换:
byte[] byteArray = "abcd".getBytes( "ISO-8859-1" );
new String( byteArray, "ISO-8859-1" );
EDIT:
编辑:
It seems that the problem is the encoding of your java file. If it works on windows, try compiling the source files on linux with javac -encondig ISO-8859-1
. This should solve your problem.
问题似乎是您的java文件的编码。如果它适用于 Windows,请尝试在 linux 上使用javac -encondig ISO-8859-1
. 这应该可以解决您的问题。
回答by BalusC
Your problem is a bit vague. You mentioned that -Dfile.encoding
solved your linux problem, but this is in fact only used to inform the Sun(!) JVM which encoding to use to manage filenames/pathnames at the local disk file system. And ... this does't fit in the problem description you literally gave: "converting chars to bytes and back to chars failed". I don't see what -Dfile.encoding
has to do with this. There must be more into the story. How did you conclude that it failed? Did you read/write those characters from/into a pathname/filename or so? Or was you maybe printing to the stdout? Did the stdout itselfuse the proper encoding?
你的问题有点模糊。您提到这-Dfile.encoding
解决了您的 linux 问题,但这实际上仅用于通知 Sun(!) JVM 使用哪种编码来管理本地磁盘文件系统中的文件名/路径名。而且......这不符合您字面上给出的问题描述:“将字符转换为字节并返回到字符失败”。我不明白这-Dfile.encoding
有什么关系。故事里应该还有更多。你是如何得出它失败的结论的?您是否从/向路径名/文件名中读/写了这些字符?或者你可能打印到标准输出?标准输出本身是否使用了正确的编码?
That said, why would you like to convert the chars forth and back to/from bytes? I don't see any useful business purposes for this.
也就是说,您为什么要将字符来回转换为字节?我没有看到任何有用的商业目的。
(sorry, this didn't fit in a comment, but I will update this with the answer if you have given more info about the actual functional requirement).
(对不起,这不适合评论,但如果您提供了有关实际功能要求的更多信息,我会用答案更新它)。
Update:as per the comments: you basically just need to configure the stdout/cmd so that it uses the proper encoding to display those characters. In Windows you can do that with chcp
command, but there's one major caveat: the standard fonts used in Windows cmd does not have the proper glyphs (the actual font pictures) for characters outside the ISO-8859 charsets. You can hack the one or other in registryto add proper fonts. No wording about Linux as I don't do it extensively, but it look like that -Dfile.encoding
is somehow the way to go. After all ... I think it's better to replace cmd with a crossplatform UI tool to display the characters the way you want, for example Swing.
更新:根据评论:您基本上只需要配置 stdout/cmd 以便它使用正确的编码来显示这些字符。在 Windows 中,您可以使用chcp
命令来执行此操作,但有一个主要警告:Windows cmd 中使用的标准字体对于 ISO-8859 字符集之外的字符没有正确的字形(实际字体图片)。您可以破解注册表中的一个或另一个以添加适当的字体。没有关于 Linux 的措辞,因为我没有广泛地这样做,但看起来这-Dfile.encoding
是要走的路。毕竟......我认为最好用跨平台UI工具替换 cmd 以按您想要的方式显示字符,例如Swing。