Java 2 字节 UTF-8 序列的无效字节 2

Question

提问by flyingfromchina

I am trying to parse an XML file with <?version = 1.0, encoding = UTF-8>but ran into an error message invalid byte 2 of 2-byte UTF-8 sequence. Does anybody know what caused this problem?

我正在尝试解析 XML 文件，<?version = 1.0, encoding = UTF-8>但遇到错误消息invalid byte 2 of 2-byte UTF-8 sequence。有谁知道是什么导致了这个问题？

Answer 1

回答by Ignacio Vazquez-Abrams

Either the parser is set for UTF-8 even though the file is encoded otherwise, or the file is declared as using UTF-8 but it really doesn't.

即使文件以其他方式编码，解析器也设置为 UTF-8，或者文件被声明为使用 UTF-8，但实际上并没有。

Answer 2

回答by StaxMan

Most commonly it's due to feeding ISO-8859-x(Latin-x, like Latin-1) but parser thinking it is getting UTF-8. Certain sequences of Latin-1 characters (two consecutive characters with accents or umlauts) form something that is invalid as UTF-8, and specifically such that based on first byte, second byte has unexpected high-order bits.

最常见的是由于喂食ISO-8859-x（Latin-x，如Latin-1）但解析器认为它正在获取UTF-8. 某些 Latin-1 字符序列（两个带有重音符号或变音符号的连续字符）形成了无效的东西 as UTF-8，特别是基于第一个字节，第二个字节具有意外的高位。

This can easily occur when some process dumps out XMLusing Latin-1, but either forgets to output XMLdeclaration (in which case XMLparser must default to UTF-8, as per XMLspecs), or claims it's UTF-8even when it isn't.

当某些进程XML使用 Latin-1转储时很容易发生这种情况，但要么忘记输出XML声明（在这种情况下，XML解析器必须默认为UTF-8，根据XML规范），或者声称它是，UTF-8即使它不是。

Answer 3

回答by atott

You could try to change default character encoding used by String.getBytes() to utf-8. Use VM option -Dfile.encoding=utf-8.

您可以尝试将 String.getBytes() 使用的默认字符编码更改为 utf-8。使用 VM 选项 -Dfile.encoding=utf-8。

Answer 4

回答by Spenhouet

I had the same problem. My problem was that I created a new XML file with jdom and the FileWriter(xmlFile). The FileWriter was not able to create a UTF-8 File. Instead using the FileOutputStream(xmlFile)solved it.

我有同样的问题。我的问题是我用 jdom 和FileWriter(xmlFile)创建了一个新的 XML 文件。FileWriter 无法创建 UTF-8 文件。而是使用FileOutputStream(xmlFile)解决了它。

Answer 5

回答by Salah Klein

For those who still get such mistake.

对于那些仍然犯这样错误的人。

since UTF-8 is being used check out your xml document for any latin letters or so: I had the same problem and the reason was i had this:

由于正在使用 UTF-8，请检查您的 xml 文档中是否有任何拉丁字母左右：我遇到了同样的问题，原因是我有这个：

<n:name>?ke Jógvan ?yvind</n:name>

Hope this helps

希望这可以帮助

Answer 6

回答by Athu

I had the same problem too when trying import my .xml file into my java tool. And I found a good solution for this: 1. Open the .xml file with Notepad++ then save the .xml file as .rtf file. Then open this file in WordPad application. 2. Save the .rtf file as .txt file, then open it with Notepad, and save it as .xml file again. When saving in Notepad, near the end of the pop-up window, make sure choosing the option "Encoding: UTF-8". It worked for mine, hope it's useful for yours too.

尝试将我的 .xml 文件导入我的 java 工具时，我也遇到了同样的问题。我找到了一个很好的解决方案： 1. 用 Notepad++ 打开 .xml 文件，然后将 .xml 文件另存为 .rtf 文件。然后在写字板应用程序中打开此文件。2. 将.rtf 文件另存为.txt 文件，然后用记事本打开，再次将其另存为.xml 文件。在记事本中保存时，在弹出窗口的末尾附近，确保选择“编码：UTF-8”选项。它对我有用，希望它对你也有用。

Answer 7

回答by Oleksii Kyslytsyn

The switching of the encoding for the input might help in this case:

在这种情况下，输入编码的切换可能会有所帮助：

XMLEventReader eventReader =
                            inputFactory.createXMLEventReader(in, 
                                    "utf-8"
                                    //"windows-1251"
                            );

Java 2 字节 UTF-8 序列的无效字节 2

提问by flyingfromchina

回答by Ignacio Vazquez-Abrams

回答by StaxMan

回答by atott

回答by Spenhouet

回答by Salah Klein

回答by Athu

回答by Oleksii Kyslytsyn

相关推荐

最近更新

标签

Java 2 字节 UTF-8 序列的无效字节 2

提问by flyingfromchina

回答by Ignacio Vazquez-Abrams

回答by StaxMan

回答by atott

回答by Spenhouet

回答by Salah Klein

回答by Athu

回答by Oleksii Kyslytsyn

相关推荐

Java 在 SimpleDateFormat 模式字符串中使用字母字符

为什么我们可以向 java LinkedList 添加空元素？

如何使用 java 播放 .wav 文件

Java com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException 插入mysql错误

相关推荐

最近更新

标签