java 验证一个巨大的 XML 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40663/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 10:49:11  来源:igfitidea点击:

Validating a HUGE XML file

javaxmlvalidationxsd

提问by Dan Cramer

I'm trying to find a way to validate a large XML file against an XSD. I saw the question ...best way to validate an XML...but the answers all pointed to using the Xerces library for validation. The only problem is, when I use that library to validate a 180 MB file then I get an OutOfMemoryException.

我正在尝试找到一种方法来针对 XSD 验证大型 XML 文件。我看到了这个问题……验证 XML 的最佳方法……但答案都指向使用 Xerces 库进行验证。唯一的问题是,当我使用该库来验证 180 MB 的文件时,我会收到 OutOfMemoryException。

Are there any other tools,libraries, strategies for validating a larger than normal XML file?

是否有其他工具、库、策略来验证比普通 XML 文件更大的文件?

EDIT: The SAX solution worked for java validation, but the other two suggestions for the libxml tool were very helpful as well for validation outside of java.

编辑:SAX 解决方案适用于 java 验证,但 libxml 工具的另外两个建议对于 java 之外的验证也非常有帮助。

回答by jodonnell

Instead of using a DOMParser, use a SAXParser. This reads from an input stream or reader so you can keep the XML on disk instead of loading it all into memory.

不使用 DOMParser,而是使用 SAXParser。这从输入流或读取器中读取,因此您可以将 XML 保存在磁盘上,而不是将其全部加载到内存中。

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource(new FileReader ("document.xml")));

回答by John Millikin

Use libxml, which performs validation andhas a streaming mode.

使用libxml,它执行验证具有流模式。

回答by dlamblin

Personally I like to use XMLStarletwhich has a command line interface, and works on streams. It is a set of tools built on Libxml2.

我个人喜欢使用XMLStarlet,它有一个命令行界面,可以处理流。它是一套建立在 Libxml2 之上的工具。

回答by GaZ

SAX and libXML will help, as already mentioned. You could also try increasing the maximum heap size for the JVM using the -Xmx option. E.g. to set the maximum heap size to 512MB: java -Xmx512m com.foo.MyClass

如前所述,SAX 和 libXML 会有所帮助。您还可以尝试使用 -Xmx 选项增加 JVM 的最大堆大小。例如,将最大堆大小设置为 512MB:java -Xmx512m com.foo.MyClass