linux上的Java字符集问题

Question

提问by Inv3r53

problem: I have a string containing special characters which i convert to bytes and vice versa..the conversion works properly on windows but on linux the special character is not converted properly.the default charset on linux is UTF-8 as seen with Charset.defaultCharset.getdisplayName()

问题：我有一个包含特殊字符的字符串，我将其转换为字节，反之亦然。转换在 Windows 上正常工作，但在 linux 上，特殊字符没有正确转换。Linux 上的默认字符集是 UTF-8，如 Charset 所示。 defaultCharset.getdisplayName()

however if i run on linux with option -Dfile.encoding=ISO-8859-1 it works properly..

但是，如果我使用选项 -Dfile.encoding=ISO-8859-1 在 linux 上运行，它可以正常工作..

how to make it work using the UTF-8 default charset and not setting the -D option in unix environment.

如何使用 UTF-8 默认字符集使其工作，而不是在 unix 环境中设置 -D 选项。

edit: i use jdk1.6.13

编辑：我使用 jdk1.6.13

edit:code snippet works with cs = "ISO-8859-1"; or cs="UTF-8"; on win but not in linux

编辑：代码片段适用于 cs = "ISO-8859-1"; 或 cs="UTF-8"; 在 win 上，但不在 linux 中

        String x = "?";
        System.out.println(x);
        byte[] ba = x.getBytes(Charset.forName(cs));
        for (byte b : ba) {
            System.out.println(b);
        }
        String y = new String(ba, Charset.forName(cs));
        System.out.println(y);

~regards daed

~问候daed

Answer 1

采纳答案by McDowell

Your characters are probably being corrupted by the compilation process and you're ending up with junk data in your class file.

您的字符可能已被编译过程损坏，并且您的类文件中最终会出现垃圾数据。

if i run on linux with option -Dfile.encoding=ISO-8859-1 it works properly..

如果我使用选项 -Dfile.encoding=ISO-8859-1 在 linux 上运行，它可以正常工作..

The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution.

J2SE 平台规范不需要“file.encoding”属性；它是 Sun 实现的内部细节，不应由用户代码检查或修改。它也是只读的；技术上不可能支持在命令行上或在程序执行期间的任何其他时间将此属性设置为任意值。

In short, don't use -Dfile.encoding=...

简而言之，不要使用 -Dfile.encoding=...

    String x = "?";

Since U+00bd (½) will be represented by different values in different encodings:

由于 U+00bd (½) 将在不同的编码中由不同的值表示：

windows-1252     BD
UTF-8            C2 BD
ISO-8859-1       BD

...you need to tell your compiler what encoding your source file is encoded as:

...你需要告诉你的编译器你的源文件被编码为什么编码：

javac -encoding ISO-8859-1 Foo.java

Now we get to this one:

现在我们来看看这个：

    System.out.println(x);

As a PrintStream, this will encode data to the system encoding prior to emitting the byte data. Like this:

作为PrintStream，这将在发送字节数据之前将数据编码为系统编码。像这样：

 System.out.write(x.getBytes(Charset.defaultCharset()));

That may or may not work as you expect on some platforms- the byte encoding must match the encoding the console is expecting for the characters to show up correctly.

这在某些平台上可能会也可能不会像您期望的那样工作- 字节编码必须与控制台期望字符正确显示的编码相匹配。

Answer 2

回答by tangens

You should make the conversion explicitly:

您应该明确进行转换：

byte[] byteArray = "abcd".getBytes( "ISO-8859-1" );
new String( byteArray, "ISO-8859-1" );

EDIT:

编辑：

It seems that the problem is the encoding of your java file. If it works on windows, try compiling the source files on linux with javac -encondig ISO-8859-1. This should solve your problem.

问题似乎是您的java文件的编码。如果它适用于 Windows，请尝试在 linux 上使用javac -encondig ISO-8859-1. 这应该可以解决您的问题。

Answer 3

回答by BalusC

Your problem is a bit vague. You mentioned that -Dfile.encodingsolved your linux problem, but this is in fact only used to inform the Sun(!) JVM which encoding to use to manage filenames/pathnames at the local disk file system. And ... this does't fit in the problem description you literally gave: "converting chars to bytes and back to chars failed". I don't see what -Dfile.encodinghas to do with this. There must be more into the story. How did you conclude that it failed? Did you read/write those characters from/into a pathname/filename or so? Or was you maybe printing to the stdout? Did the stdout itselfuse the proper encoding?

你的问题有点模糊。您提到这-Dfile.encoding解决了您的 linux 问题，但这实际上仅用于通知 Sun(!) JVM 使用哪种编码来管理本地磁盘文件系统中的文件名/路径名。而且......这不符合您字面上给出的问题描述：“将字符转换为字节并返回到字符失败”。我不明白这-Dfile.encoding有什么关系。故事里应该还有更多。你是如何得出它失败的结论的？您是否从/向路径名/文件名中读/写了这些字符？或者你可能打印到标准输出？标准输出本身是否使用了正确的编码？

That said, why would you like to convert the chars forth and back to/from bytes? I don't see any useful business purposes for this.

也就是说，您为什么要将字符来回转换为字节？我没有看到任何有用的商业目的。

(sorry, this didn't fit in a comment, but I will update this with the answer if you have given more info about the actual functional requirement).

（对不起，这不适合评论，但如果您提供了有关实际功能要求的更多信息，我会用答案更新它）。

Update:as per the comments: you basically just need to configure the stdout/cmd so that it uses the proper encoding to display those characters. In Windows you can do that with chcpcommand, but there's one major caveat: the standard fonts used in Windows cmd does not have the proper glyphs (the actual font pictures) for characters outside the ISO-8859 charsets. You can hack the one or other in registryto add proper fonts. No wording about Linux as I don't do it extensively, but it look like that -Dfile.encodingis somehow the way to go. After all ... I think it's better to replace cmd with a crossplatform UI tool to display the characters the way you want, for example Swing.

更新：根据评论：您基本上只需要配置 stdout/cmd 以便它使用正确的编码来显示这些字符。在 Windows 中，您可以使用chcp命令来执行此操作，但有一个主要警告：Windows cmd 中使用的标准字体对于 ISO-8859 字符集之外的字符没有正确的字形（实际字体图片）。您可以破解注册表中的一个或另一个以添加适当的字体。没有关于 Linux 的措辞，因为我没有广泛地这样做，但看起来这-Dfile.encoding是要走的路。毕竟......我认为最好用跨平台UI工具替换 cmd 以按您想要的方式显示字符，例如Swing。

linux上的Java字符集问题

提问by Inv3r53

采纳答案by McDowell

回答by tangens

回答by BalusC

相关推荐

最近更新

标签

linux上的Java字符集问题

提问by Inv3r53

采纳答案by McDowell

回答by tangens

回答by BalusC

相关推荐

Java com.sun.mail.smtp.SMTPSendFailedException: 530-5.5.1 需要身份验证

Java oracle一致性的开源替代品？

Java 如何在Hibernate中从数据库中获取数据

Java真的很慢吗？

相关推荐

最近更新

标签