Java 2 字节 UTF-8 序列的无效字节 2
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2421272/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
invalid byte 2 of 2-byte UTF-8 sequence
提问by flyingfromchina
I am trying to parse an XML file with <?version = 1.0, encoding = UTF-8>
but ran into an error message invalid byte 2 of 2-byte UTF-8 sequence
. Does anybody know what caused this problem?
我正在尝试解析 XML 文件,<?version = 1.0, encoding = UTF-8>
但遇到错误消息invalid byte 2 of 2-byte UTF-8 sequence
。有谁知道是什么导致了这个问题?
回答by Ignacio Vazquez-Abrams
Either the parser is set for UTF-8 even though the file is encoded otherwise, or the file is declared as using UTF-8 but it really doesn't.
即使文件以其他方式编码,解析器也设置为 UTF-8,或者文件被声明为使用 UTF-8,但实际上并没有。
回答by StaxMan
Most commonly it's due to feeding ISO-8859-x
(Latin-x, like Latin-1) but parser thinking it is getting UTF-8
. Certain sequences of Latin-1 characters (two consecutive characters with accents or umlauts) form something that is invalid as UTF-8
, and specifically such that based on first byte, second byte has unexpected high-order bits.
最常见的是由于喂食ISO-8859-x
(Latin-x,如Latin-1)但解析器认为它正在获取UTF-8
. 某些 Latin-1 字符序列(两个带有重音符号或变音符号的连续字符)形成了无效的东西 as UTF-8
,特别是基于第一个字节,第二个字节具有意外的高位。
This can easily occur when some process dumps out XML
using Latin-1, but either forgets to output XML
declaration (in which case XML
parser must default to UTF-8
, as per XML
specs), or claims it's UTF-8
even when it isn't.
当某些进程XML
使用 Latin-1转储时很容易发生这种情况,但要么忘记输出XML
声明(在这种情况下,XML
解析器必须默认为UTF-8
,根据XML
规范),或者声称它是,UTF-8
即使它不是。
回答by atott
You could try to change default character encoding used by String.getBytes() to utf-8. Use VM option -Dfile.encoding=utf-8.
您可以尝试将 String.getBytes() 使用的默认字符编码更改为 utf-8。使用 VM 选项 -Dfile.encoding=utf-8。
回答by Spenhouet
I had the same problem. My problem was that I created a new XML file with jdom and the FileWriter(xmlFile). The FileWriter was not able to create a UTF-8 File. Instead using the FileOutputStream(xmlFile)solved it.
我有同样的问题。我的问题是我用 jdom 和FileWriter(xmlFile)创建了一个新的 XML 文件。FileWriter 无法创建 UTF-8 文件。而是使用FileOutputStream(xmlFile)解决了它。
回答by Salah Klein
For those who still get such mistake.
对于那些仍然犯这样错误的人。
since UTF-8 is being used check out your xml document for any latin letters or so: I had the same problem and the reason was i had this:
由于正在使用 UTF-8,请检查您的 xml 文档中是否有任何拉丁字母左右:我遇到了同样的问题,原因是我有这个:
<n:name>?ke Jógvan ?yvind</n:name>
Hope this helps
希望这可以帮助
回答by Athu
I had the same problem too when trying import my .xml file into my java tool. And I found a good solution for this: 1. Open the .xml file with Notepad++ then save the .xml file as .rtf file. Then open this file in WordPad application. 2. Save the .rtf file as .txt file, then open it with Notepad, and save it as .xml file again. When saving in Notepad, near the end of the pop-up window, make sure choosing the option "Encoding: UTF-8". It worked for mine, hope it's useful for yours too.
尝试将我的 .xml 文件导入我的 java 工具时,我也遇到了同样的问题。我找到了一个很好的解决方案: 1. 用 Notepad++ 打开 .xml 文件,然后将 .xml 文件另存为 .rtf 文件。然后在写字板应用程序中打开此文件。2. 将.rtf 文件另存为.txt 文件,然后用记事本打开,再次将其另存为.xml 文件。在记事本中保存时,在弹出窗口的末尾附近,确保选择“编码:UTF-8”选项。它对我有用,希望它对你也有用。
回答by Oleksii Kyslytsyn
The switching of the encoding for the input might help in this case:
在这种情况下,输入编码的切换可能会有所帮助:
XMLEventReader eventReader =
inputFactory.createXMLEventReader(in,
"utf-8"
//"windows-1251"
);