Java:MalformedByteSequenceException (XML)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1871340/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 18:19:02  来源:igfitidea点击:

Java: MalformedByteSequenceException (XML)

javaxmlutf-8text

提问by Nick Heiner

I'm trying to parse XML using this class. When I type out a simple file, it works fine.

我使用这个试图解析XML。当我输入一个简单的文件时,它工作正常。

<testData>
    <text>
        odp
    </text>
</testData>

Here is my main

这是我的 main

public static void main(String[] args) { 
    Xml train = new Xml(args[0], "trainingData");
    Xml test = new Xml(args[1], "testData");
}

However, when I use the file I got by copying and pasting from MSFT Office OneNote, I get errors:

但是,当我使用通过从 MSFT Office OneNote 复制和粘贴获得的文件时,出现错误:

Exception in thread "main" java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
    at odp.compling.Xml.rootElement(Xml.java:41)
    at odp.compling.Xml.<init>(Xml.java:61)
    at odp.compling.ParseTreeAnalysis2.main(ParseTreeAnalysis2.java:10)
Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
    at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
    at odp.compling.Xml.rootElement(Xml.java:33)
    ... 2 more

What is causing this? I edited the problematic XML file in Notepad++ and changed the encoding to UTF-8. This caused a bunch of weird characters from the accents/special quotation marks, which I edited out. Am I not converting properly?

这是什么原因造成的?我在 Notepad++ 中编辑了有问题的 XML 文件并将编码更改为UTF-8. 这导致重音/特殊引号中出现了一堆奇怪的字符,我已将其删除。我没有正确转换吗?

(I don't know anything about text encoding formats, in case you couldn't tell.)

(我对文本编码格式一无所知,以防万一你不知道。)

回答by ZZ Coder

Your file is not properly encoded as UTF-8 but your parser is expecting UTF-8 encoding.

您的文件未正确编码为 UTF-8,但您的解析器需要 UTF-8 编码。

It would help to pin-point the problem is you can post a hexdump of the file.

这将有助于查明问题是您可以发布文件的十六进制转储。