java JAXB 错误的解释:无效的 1 字节 UTF-8 序列的字节 1

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3039998/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 00:04:08  来源:igfitidea点击:

Explanation of JAXB error: Invalid byte 1 of 1-byte UTF-8 sequence

javaxmlencodingutf-8jaxb

提问by Marcus Leon

We're parsing an XML document using JAXB and get this error:

我们正在使用 JAXB 解析 XML 文档并收到此错误:

[org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)

What exactly does this mean and how can we resolve this??

这究竟是什么意思,我们如何解决这个问题?

We are executing the code as:

我们执行的代码如下:

jaxbContext = JAXBContext.newInstance(Results.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
unmarshaller.setSchema(getSchema());
results = (Results) unmarshaller.unmarshal(new FileInputStream(inputFile));


Update

更新

Issue appears to be due to this "funny" character in the XML file: ?

问题似乎是由于 XML 文件中的这个“有趣”字符造成的: ?

Why would this cause such a problem??

为什么会出现这样的问题??

Update 2

更新 2

There are two of those weird characters in the file. They are around the middle of the file. Note that the file is created based on data in a database and those weird characters somehow got into the database.

文件中有两个奇怪的字符。它们位于文件的中间。请注意,该文件是基于数据库中的数据创建的,而那些奇怪的字符以某种方式进入了数据库。

Update 3

更新 3

Here is the full XML snippet:

这是完整的 XML 片段:

<Description><![CDATA[Mt. Belvieu ? Texas]]></Description>

Update 4

更新 4

Note that there is no <?xml ...?>header.

请注意,没有<?xml ...?>标题。

The HEX for the special character is BF

特殊字符的十六进制是 BF

采纳答案by axtavt

So, you problem is that JAXB treats XML files without <?xml ...?>header as UTF-8, when your file uses some other encoding (probably ISO-8859-1 or Windows-1252, if 0xBFcharacter actually intended to mean ?).

因此,您的问题是<?xml ...?>,当您的文件使用其他一些编码(可能是 ISO-8859-1 或 Windows-1252,如果0xBF字符实际上意味着?)时,JAXB 将没有标头的XML 文件视为 UTF-8 。

If you can change the producer of the file, you may add <?xml ...?>header with actual encoding specification, or just use UTF-8 to write a file.

如果您可以更改文件的生产者,您可以添加<?xml ...?>具有实际编码规范的标头,或者仅使用UTF-8来编写文件。

If you can't change the producer, you have to use InputStreamReaderwith explicit encoding specification, because (unfortunately) JAXB don't allow to change its default encoding:

如果您不能更改生产者,则必须使用InputStreamReader显式编码规范,因为(不幸的是)JAXB 不允许更改其默认编码:

results = (Results) unmarshaller.unmarshal(
   new InputStreamReader(new FileInputStream(inputFile), "ISO-8859-1")); 

However, this solution is fragile - it fails on input files with <?xml ...?>header with different encoding specification.

然而,这个解决方案是脆弱的——它在<?xml ...?>具有不同编码规范的标头的输入文件上失败。

回答by skaffman

That's probably a Byte Order Mark (BOM), and is a special byte sequence at the start of a UTF file. They are, frankly, a pain in the arse, and seem particularly common when interacting with .net systems.

这可能是Byte Order Mark (BOM),并且是 UTF 文件开头的特殊字节序列。坦率地说,它们令人讨厌,并且在与 .net 系统交互时似乎特别常见。

Try rephrasing your code to use a Readerrather than an InputStream:

尝试改写您的代码以使用 aReader而不是 an InputStream

results = (Results) unmarshaller.unmarshal(new FileReader(inputFile));

A Readeris UTF-aware, and might make a better stab at it. More simply, pass the Filedirectly to the Unmarshaller, and let the JAXBContextworry about it:

AReader是 UTF-aware,并且可能会更好地尝试它。更简单的,将File直接传递给Unmarshaller,让其JAXBContext担心:

results = (Results) unmarshaller.unmarshal(inputFile);

回答by Andy

It sounds as if your XML is encoded with UTF-16 but that encoding is not getting passed to the Unmarshaller. With the Marshaller you can set that using marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-16");but because the Unmarshaller is not required to support any properties, I am not sure how to enforce that other than ensuring your XML document has encoding="UTF-16"in the initial <?xml?>element.

听起来好像您的 XML 是用 UTF-16 编码的,但该编码没有传递给 Unmarshaller。使用 Marshaller,您可以设置它,marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-16");但是因为 Unmarshaller 不需要支持任何属性,所以除了确保您的 XML 文档encoding="UTF-16"在初始<?xml?>元素中之外,我不知道如何强制执行。