Java 忽略“尾随部分不允许内容”SAXException

Question

提问by Paul J. Lucas

I'm using Java's DocumentBuilder.parse(InputStream)to parse an XML document. Occasionally, I get malformed XML documents in that there is extra junk after the final >that causes a SAXException: Content is not allowed in trailing section. (In the cases I've seen, the junk is simply one or more null bytes.)

我正在使用 JavaDocumentBuilder.parse(InputStream)来解析 XML 文档。有时，我会收到格式错误的 XML 文档，因为在最终>导致SAXException: Content is not allowed in trailing section. （在我见过的情况下，垃圾只是一个或多个空字节。）

I don't care what's after the final >. Is there an easy way to parse an entire XML document in Java and have it ignore any trailing junk?

我不在乎决赛之后是什么>。有没有一种简单的方法可以用 Java 解析整个 XML 文档并让它忽略任何尾随的垃圾？

Note that by "ignore" I don't simply mean to catch and ignore the exception: I mean to ignore the trailing junk, throw no exception, and to return the Documentobject since the XML up to an including the final >is valid.

请注意，“忽略”并不是简单地表示捕获和忽略异常：我的意思是忽略尾随的垃圾，不抛出异常，并返回Document对象，因为 XML 直到包含最终内容>都是有效的。

Answer 1

采纳答案by Don Roby

Since your sender is presenting you with invalid XML, it needs to be corrected before it hits the parser if you want to avoid this exception. If you can't correct the sender, you'll need a preprocessing step of some sort.

由于您的发件人向您展示了无效的 XML，如果您想避免此异常，则需要在它到达解析器之前对其进行更正。如果您无法更正发件人，则需要某种预处理步骤。

If the situation is simply that you've got extra null bytes after the closing tag as indeicated by one of your responses to another answer, this might be something you can accomplish easily by wrapping your input stream in a FilterInputStreamthat you implement to skip null bytes.

如果情况只是您在结束标记之后有额外的空字节，正如您对另一个答案的一个响应所指示的那样，这可能是您可以通过将输入流包装在FilterInputStream您实现的跳过空字节的a中来轻松完成的事情.

If the problem is more complex than just null characters, you'll of course need a more complex filter, which might be difficult.

如果问题比空字符更复杂，您当然需要更复杂的过滤器，这可能很困难。

If you're using a ContentHandler, you can add a callback to it so that it can inform the calling code when the ending root tag has been handled, and based on that knowledge, the calling code can have logic in its handler for the exception to simply ignore it if the end has been signalled. At that point anything that had to be done by the parser has likely been done anyway! But this solution doesn't seem to apply for your situation.

如果您使用的是ContentHandler，则可以向其添加回调，以便它可以在处理结束根标记时通知调用代码，并且基于该知识，调用代码可以在其处理程序中具有用于异常的逻辑如果已经发出结束信号，只需忽略它。到那时，任何必须由解析器完成的事情都可能已经完成了！但是此解决方案似乎不适用于您的情况。

Answer 2

回答by Brett Kail

No. A document that contains trailing characters is not an XML document. Fix the sender.

不可以。包含尾随字符的文档不是 XML 文档。修复发件人。

Java 忽略“尾随部分不允许内容”SAXException

提问by Paul J. Lucas

采纳答案by Don Roby

回答by Brett Kail

相关推荐

最近更新

标签

Java 忽略“尾随部分不允许内容”SAXException

提问by Paul J. Lucas

采纳答案by Don Roby

回答by Brett Kail

相关推荐

Java 将 ISO-8859-1 转换为 UTF-8

Tomcat 线程与 Java 线程

Java 如何从 JDBC 截断 Postgresql 的表

Java Hibernate Criteria API：获取 n 个随机行

相关推荐

最近更新

标签