Java JAXB 和 UTF-8 解组异常“2 字节 UTF-8 序列的无效字节 2”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18193103/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
JAXB & UTF-8 Unmarshal exception "Invalid byte 2 of 2-byte UTF-8 sequence"
提问by Entropy
I've read a few SO answers that say that JAXB has a bug that it blames on XML's nature which cause it to not work with UTF-8. My question is, so what is the workaround? I may get unicode character entered by my users copying and pasting into a data field that I need to preserve, marshal, unmarshal, and re-display elsewhere.
我读过一些 SO 答案,说 JAXB 有一个错误,它归咎于 XML 的性质,导致它无法与 UTF-8 一起使用。我的问题是,那么解决方法是什么?我可能会得到用户输入的 unicode 字符,将其复制并粘贴到我需要保留、编组、解组并在其他地方重新显示的数据字段中。
(update) More Context:
(更新)更多上下文:
Candidate c = new Candidate();
c.addSubstitution("3 4ths", "\u00BE");
c.addSubstitution("n with tilde", "\u00F1");
c.addSubstitution("schwa", "\u018F");
c.addSubstitution("Sigma", "\u03A3");
c.addSubstitution("Cyrillic Th", "\u040B");
jc = JAXBContext.newInstance(Candidate.class);
Marshaller marshaller = jc.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-8");
ByteArrayOutputStream os = new ByteArrayOutputStream();
marshaller.marshal(c, os);
String xml = os.toString();
System.out.println(xml);
jc = JAXBContext.newInstance(Candidate.class);
Unmarshaller jaxb = jc.createUnmarshaller();
ByteArrayInputStream is = new ByteArrayInputStream(xml.getBytes());
Candidate newCandidate = (Candidate) jaxb.unmarshal(is);
for(Substitution s:c.getSubstitutions()) {
System.out.println(s.getSubstitutionName() + "='" + s.getSubstitutionValue() + "'");
}
Here's a little test bit I threw together. The exact characters I get are not entirely under my control. users may paste a N with tilde into the field or whatever.
这是我拼凑的一个小测试。我得到的确切字符并不完全在我的控制之下。用户可以将带有波浪号的 N 粘贴到该字段或其他任何内容中。
采纳答案by Jon Skeet
This is the problem in your test code:
这是您的测试代码中的问题:
ByteArrayInputStream is = new ByteArrayInputStream(xml.getBytes());
You're using the platform default encoding to convert the string to a byte array. Don't do that.You've specified that you're going to use UTF-8, so you must do so when you create the byte array:
您正在使用平台默认编码将字符串转换为字节数组。不要那样做。您已指定要使用 UTF-8,因此在创建字节数组时必须这样做:
ByteArrayInputStream is = new ByteArrayInputStream(xml.getBytes("UTF-8"));
Likewise don't use ByteArrayOutputStream.toString()
, which again uses the platform default encoding. Indeed, you don't need to convert the output to a string at all:
同样不要使用ByteArrayOutputStream.toString()
,它再次使用平台默认编码。实际上,您根本不需要将输出转换为字符串:
ByteArrayOutputStream os = new ByteArrayOutputStream();
marshaller.marshal(c, os);
byte[] xml = os.toByteArray();
jc = JAXBContext.newInstance(Candidate.class);
Unmarshaller jaxb = jc.createUnmarshaller();
ByteArrayInputStream is = new ByteArrayInputStream(xml);
This should have no problems with the characters you're using - it will still have problems which can't be represented in XML 1.0 (characters below U+0020 other than \r
, \n
and \t
) but that's all.
这应该与您使用的字符没有问题 - 它仍然存在无法在 XML 1.0 中表示的问题(除\r
,\n
和之外的 U+0020 以下字符\t
),仅此而已。