Java 中的 XML 语法验证

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6362926/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-16 05:19:46  来源:igfitidea点击:

XML syntax validation in Java

javaxmlvalidationsyntax

提问by Hristo

I've been trying to figure out how to check the syntax of an XML file, make sure all tags are closed, there's no random characters, etc... All I care at this point is making sure there is no broken XML in the file.

我一直在试图弄清楚如何检查 XML 文件的语法,确保所有标签都关闭,没有随机字符等......此时我所关心的是确保没有损坏的 XML文件。

I've been looking at some SO posts like these...

我一直在看一些像这样的SO帖子......

... but I realized that I don't want to validate the structure of the XML file; I don't want tovalidate against an XML Schema (XSD)... I just want to check the XML syntax and determine if it is correct.

...但我意识到我不想验证 XML 文件的结构;我不想针对 XML 架构 (XSD)进行验证……我只想检查 XML 语法并确定它是否正确。

采纳答案by James Allardice

You can check if an XML document is well-formedusing the following code:

您可以使用以下代码检查 XML 文档是否格式正确

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);

DocumentBuilder builder = factory.newDocumentBuilder();

builder.setErrorHandler(new SimpleErrorHandler());    
// the "parse" method also validates XML, will throw an exception if misformatted
Document document = builder.parse(new InputSource("document.xml"));

The SimpleErrorHandlerclass referred to in the above code is as follows:

SimpleErrorHandler上面代码中引用的类如下:

public class SimpleErrorHandler implements ErrorHandler {
    public void warning(SAXParseException e) throws SAXException {
        System.out.println(e.getMessage());
    }

    public void error(SAXParseException e) throws SAXException {
        System.out.println(e.getMessage());
    }

    public void fatalError(SAXParseException e) throws SAXException {
        System.out.println(e.getMessage());
    }
}

This came from this website, which provides various methods for validating XML with Java. Note also that this method loads an entire DOM tree into memory, see comments for alternatives if you want to save on RAM.

这来自这个网站,它提供了各种用 Java 验证 XML 的方法。另请注意,此方法将整个 DOM 树加载到内存中,如果您想节省 RAM,请参阅替代方法的注释。

回答by nsfyn55

http://www.ibm.com/developerworks/xml/library/x-javaxmlvalidapi/index.htmlDoes this help? It uses XSD which is pretty robust. Not only can you validate the documents structure, but you can supply some pretty complex rules about what type of content your nodes and attributes can contain.

http://www.ibm.com/developerworks/xml/library/x-javaxmlvalidapi/index.html这有帮助吗?它使用非常健壮的 XSD。您不仅可以验证文档结构,还可以提供一些关于节点和属性可以包含的内容类型的非常复杂的规则。

回答by StaxMan

What you are asking is how to verify that a piece of content is well-formed XML document. This is easily done by simply letting an XML parser (try to) parse content in question -- if there are issues, parser will report an error by throwing exception. There really isn't anything more to that; so all you need is to figure out how to parse an XML document.

您要问的是如何验证一段内容是否是格式良好的 XML 文档。这很容易通过简单地让 XML 解析器(尝试)解析有问题的内容来完成——如果有问题,解析器将通过抛出异常来报告错误。没有什么比这更重要的了。所以你所需要的只是弄清楚如何解析一个 XML 文档。

About the only thing to beware is that some libs that claim to be XML parsers are not really proper parsers, in that they actually might not verify things that XML parser must do (as per XML specification) -- in Java, Javolution is an example of something that does little to no checking; VTD-XML and XPP3 do some verification (but not all required checks). And at the other end of spectrum, Xerces and Woodstox check everything that specification mandates. Xerces is bundled with JDK; and most web service frameworks bundle Woodstox in addition.

唯一需要注意的是,一些声称是 XML 解析器的库并不是真正合适的解析器,因为它们实际上可能无法验证 XML 解析器必须做的事情(根据 XML 规范)——在 Java 中,Javolution 就是一个例子几乎没有检查的东西;VTD-XML 和 XPP3 做一些验证(但不是所有必需的检查)。另一方面,Xerces 和 Woodstox 检查规范要求的所有内容。Xerces 与 JDK 捆绑在一起;并且大多数 Web 服务框架还捆绑了 Woodstox。

Since the accepted answer already shows how to parse content into a DOM document (which starts with parsing), that might be enough. The only caveat is that this requires that you have 3-5x as much memory available as raw size of the input document. To get around this limitation you could use a streaming parser, such as Woodstox(which implements standard Stax API). If so, you would create an XMLStreamReader, and just call reader.next()as long as reader.hasNext()returns true.

由于接受的答案已经显示了如何将内容解析为 DOM 文档(从解析开始),这可能就足够了。唯一需要注意的是,这要求您拥有 3-5 倍于输入文档原始大小的可用内存。要解决此限制,您可以使用流解析器,例如Woodstox(实现标准 Stax API)。如果是这样,您将创建一个 XMLStreamReader,并且reader.next()只要reader.hasNext()返回 true就调用。