Java 8 中带有 JAXB 的 UTF-8 字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35652281/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UTF-8 characters with JAXB in Java 8
提问by kirsty
I recently migrated an application for JBoss AS 5 to Wildfly 8, and as such had to move from Java 6 to Java 8.
我最近将 JBoss AS 5 的应用程序迁移到 Wildfly 8,因此不得不从 Java 6 迁移到 Java 8。
I'm now encountering a problem when running one of my unit tests through Ant:
我现在在通过 Ant 运行我的单元测试之一时遇到问题:
[javac] C:\Users\test\JAXBClassTest.java:123: error: unmappable character for encoding UTF8
Line 123 of the test class is:
测试类的第123行是:
Assert.assertEquals("Jμhn", JAXBClass.getValue());
This test is in place specifically to ensure that the JAXB marshaller can handle UTF-8 characters, which I believe μ
is. I have added a property onto the JAXB marshaller to ensure that these characters are allowed:
该测试专门用于确保 JAXB 编组器可以处理 UTF-8 字符,我认为μ
是这样。我在 JAXB 编组器上添加了一个属性,以确保允许使用这些字符:
marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-8");
I've seen multiple questions (1, 2, 3) on Stack Overflow which seem to be similar but their answers wither explain why invalid characters which were previously decoded one way are now decoded in another or don't appear to actually have the same issue as me.
我已经看到了多个问题(1,2,3堆栈溢出),这似乎是相似的,但他们的答案枯萎解释为什么这在以前解码的一个方法无效字符在另一个正在解码或不会出现真正具有相同像我一样问题。
If all the characters are valid should this cause an issue? I know I must be missing something but I can't see what.
如果所有字符都有效,这是否会导致问题?我知道我一定错过了一些东西,但我看不到什么。
采纳答案by SubOptimal
The problem is that in your source code the μ
is encoded as \265
. Which is not valid for UTF-8. As UTF-8 encoding it is \uC2B5
.
问题是在你的源代码中μ
被编码为\265
. 这对 UTF-8 无效。作为 UTF-8 编码,它是\uC2B5
.
In this source the character encoding for the file is ISO8859.
在此源中,文件的字符编码为 ISO8859。
class Latin1 {
public static void main(String[] args) {
String s = "μ"; // 5
System.out.println(s);
}
}
Which can be compiled with ...
可以用...编译
javac -encoding iso8859-1 Scratch.java
... but it fails with UTF-8 encoding
...但它使用 UTF-8 编码失败
javac -encoding UTF-8 Latin1.java
Latin1.java:3: error: unmappable character for encoding UTF-8
String s = "?";
^
In this source the character encoding for the file is UTF-8.
在此源中,文件的字符编码为 UTF-8。
class Utf8 {
public static void main(String[] args) {
String s = "μ"; // \uC2B5
System.out.println(s);
}
}
Which can be compiled with ISO8859-1 as well with UTF-8.
既可以使用 ISO8859-1 编译,也可以使用 UTF-8 编译。
javac -encoding UTF-8 Utf8.java
javac -encoding iso8859-1 Utf8.java
editIn case copy and past from the webpage would alter the encoding. Both source files can be created as below, which should make the difference visible.
编辑以防从网页复制和过去会改变编码。可以按如下方式创建两个源文件,这应该使差异可见。
String latin1 = "class Latin1 {\n"
+ " public static void main(String[] args) {\n"
+ " String s = \"μ\";\n"
+ " System.out.println(s);\n"
+ " }\n"
+ "}";
Files.write(Paths.get("Latin1.java"),
latin1.getBytes(StandardCharsets.ISO_8859_1));
String utf8 = "class Utf8 {\n"
+ " public static void main(String[] args) {\n"
+ " String s = \"μ\";\n"
+ " System.out.println(s);\n"
+ " }\n"
+ "}";
Files.write(Paths.get("Utf8.java"),
latin1.getBytes(StandardCharsets.UTF_8));
}