Java：MalformedByteSequenceException (XML)

Question

提问by Nick Heiner

I'm trying to parse XML using this class. When I type out a simple file, it works fine.

我使用这个试图解析XML类。当我输入一个简单的文件时，它工作正常。

<testData>
    <text>
        odp
    </text>
</testData>

Here is my main

这是我的 main

public static void main(String[] args) { 
    Xml train = new Xml(args[0], "trainingData");
    Xml test = new Xml(args[1], "testData");
}

However, when I use the file I got by copying and pasting from MSFT Office OneNote, I get errors:

但是，当我使用通过从 MSFT Office OneNote 复制和粘贴获得的文件时，出现错误：

Exception in thread "main" java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
    at odp.compling.Xml.rootElement(Xml.java:41)
    at odp.compling.Xml.<init>(Xml.java:61)
    at odp.compling.ParseTreeAnalysis2.main(ParseTreeAnalysis2.java:10)
Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
    at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
    at odp.compling.Xml.rootElement(Xml.java:33)
    ... 2 more

What is causing this? I edited the problematic XML file in Notepad++ and changed the encoding to UTF-8. This caused a bunch of weird characters from the accents/special quotation marks, which I edited out. Am I not converting properly?

这是什么原因造成的？我在 Notepad++ 中编辑了有问题的 XML 文件并将编码更改为UTF-8. 这导致重音/特殊引号中出现了一堆奇怪的字符，我已将其删除。我没有正确转换吗？

(I don't know anything about text encoding formats, in case you couldn't tell.)

（我对文本编码格式一无所知，以防万一你不知道。）

Answer 1

回答by ZZ Coder

Your file is not properly encoded as UTF-8 but your parser is expecting UTF-8 encoding.

您的文件未正确编码为 UTF-8，但您的解析器需要 UTF-8 编码。

It would help to pin-point the problem is you can post a hexdump of the file.

这将有助于查明问题是您可以发布文件的十六进制转储。

Java：MalformedByteSequenceException (XML)

提问by Nick Heiner

回答by ZZ Coder

相关推荐

最近更新

标签

Java：MalformedByteSequenceException (XML)

提问by Nick Heiner

回答by ZZ Coder

相关推荐

java 使用 Jamod 写入 modbus

java 理解类图

java 将本机 dll 与 jar 捆绑

java 在视图模式中打开会话

相关推荐

最近更新

标签