JAXB - 解组 OutOfMemory：Java 堆空间

Question

提问by TyC

I'm currently trying to use JAXB to unmarshal an XML file, but it seems that the XML file is too large (~500mb) for the unmarshaller to handle. I keep getting java.lang.OutOfMemoryError: Java heap space@

我目前正在尝试使用 JAXB 来解组 XML 文件，但 XML 文件似乎太大（~500mb），解组器无法处理。我不断收到java.lang.OutOfMemoryError: Java heap space@

Unmarshaller um = JAXBContext.newInstance("com.sample.xml");
Export e = (Export)um.unmarhsal(new File("SAMPLE.XML"));

I'm guessing this is becuase it's trying to open the large XML file as an object, but the file is just too large for the java heap space.

我猜这是因为它试图将大型 XML 文件作为对象打开，但该文件对于 java 堆空间来说太大了。

Is there any other more 'memory efficient' method of parsing large XML files ~ 500mb? Or perhaps an unmarshaller property that may help me handle the large XML file?

有没有其他更“内存高效”的方法来解析大约 500mb 的大型 XML 文件？或者也许可以帮助我处理大型 XML 文件的解组器属性？

Here's what my XML looks like

这是我的 XML 的样子

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- -->
<Export xmlns="wwww.foo.com" xmlns:xsi="www.foo1.com" xsi:schemaLocation="www.foo2.com/.xsd">
<!--- --->
<Origin ID="foooo" />
<!---- ---->
<WorkSets>
   <WorkSet>
      <Work>
         .....
      <Work>
         ....
      <Work>
      .....
   </WorkSet>
   <WorkSet>
      ....
   </WorkSet>
</WorkSets>

I'd like to unmarshal at the WorkSet level, still being able to read through all of the work for each WorkSet.

我想在工作集级别解组，仍然能够通读每个工作集的所有工作。

Answer 1

回答by bdoughan

What does your XML look like? Typically for large documents I recommend people leverage a StAX XMLStreamReader so that the document can be unmarshalled by JAXB in chunks.

你的 XML 是什么样的？通常对于大型文档，我建议人们利用 StAX XMLStreamReader，以便 JAXB 可以将文档分块解组。

input.xml

输入文件

In the document below there are many instances of the personelement. We can use JAXB with a StAX XMLStreamReaderto unmarshal the corresponding Personobjects one at a time to avoid running out of memory.

在下面的文档中有许多person元素的实例。我们可以使用带有 StAX 的 JAXB 一次XMLStreamReader解组相应的Person对象以避免内存不足。

<people>
   <person>
       <name>Jane Doe</name>
       <address>
           ...
       </address>
   </person>
   <person>
       <name>John Smith</name>
       <address>
           ...
       </address>
   </person>
   ....
</people>

Demo

演示

import java.io.*;
import javax.xml.stream.*;
import javax.xml.bind.*;

public class Demo {

    public static void main(String[] args) throws Exception  {
        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag(); // Advance to statements element

        JAXBContext jc = JAXBContext.newInstance(Person.class);
        Unmarshaller unmarshaller = jc.createUnmarshaller();
        while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            Person person = (Person) unmarshaller.unmarshal(xsr);
        }
    }

}

Person

人

Instead of matching on the root element of the XML document we need to add @XmlRootElementannotations on the local root of the XML fragment that we will be unmarshalling from.

我们需要@XmlRootElement在要解组的 XML 片段的本地根上添加注释，而不是在 XML 文档的根元素上进行匹配。

@XmlRootElement
public class Person {
}

Answer 2

回答by Dave Newton

You could increase the heap space using the -Xmxstartup argument.

您可以使用-Xmx启动参数增加堆空间。

For large files, SAX processing is more memory-efficient since it's event driven, and doesn't load the entire structure in to memory.

对于大文件，SAX 处理的内存效率更高，因为它是事件驱动的，并且不会将整个结构加载到内存中。

Answer 3

回答by Lolke Dijkstra

I've been doing a lot of research in particular with regards to parsing very large input sets conveniently. It's true that you could combine StaX and JaxB to selectively parse XML fragments, but it's not always possible or preferable. If you're interested to read more on the topic please have a look at:

我一直在做很多研究，特别是关于方便地解析非常大的输入集。的确，您可以结合 StaX 和 JaxB 来选择性地解析 XML 片段，但这并不总是可行或可取的。如果您有兴趣阅读有关该主题的更多信息，请查看：

http://xml2java.net/documents/XMLParserTechnologyForProcessingHugeXMLfiles.pdf

In this document I describe an alternative approach that is very straight forward and convenient to use. It parses arbitrarily large input sets, whilst giving you access to your data in a javabeans fashion.

在本文档中，我描述了一种非常直接且易于使用的替代方法。它解析任意大的输入集，同时让您以 javabeans 的方式访问您的数据。

Answer 4

回答by JB Nizet

Use SAXor StAX. But if the goal is to have an in-memory object representation of the file, you'll still need lots of memory to hold the contents of such a big file. In this case, your only hope is to increase the heap size using the -Xmx1024mJVM option (which sets the max heap size to 1024 MBs)

使用SAX或StAX。但是，如果目标是拥有文件的内存对象表示，您仍然需要大量内存来保存如此大文件的内容。在这种情况下，您唯一的希望是使用-Xmx1024mJVM 选项增加堆大小（将最大堆大小设置为 1024 MB）

Answer 5

回答by JustTry

You can try this too this is kind of not good practice but its working :) who cares

你也可以试试这个，这不是一个好的做法，但它的工作:) 谁在乎

http://amitsavm.blogspot.in/2015/02/partially-parsing-xml-using-jaxb-by.html

http://amissavm.blogspot.in/2015/02/partially-parsing-xml-using-jaxb-by.html

Other wise use STAX or SAX or what Blaise Doughan is saying is also good and you can say a standard way, But if you have complex XML structure and you don't want to annotate your classes manually and use XJC tool.

其他明智的使用 STAX 或 SAX 或 Blaise Doughan 所说的也很好，您可以说一种标准方式，但是如果您有复杂的 XML 结构并且您不想手动注释您的类并使用 XJC 工具。

In this case this might be helpful.

在这种情况下，这可能会有所帮助。

Answer 6

回答by AdrianS

SAX but you will have to construct your Export object yourself

SAX，但您必须自己构建导出对象

JAXB - 解组 OutOfMemory：Java 堆空间

提问by TyC

回答by bdoughan

回答by Dave Newton

回答by Lolke Dijkstra

回答by JB Nizet

回答by JustTry

回答by AdrianS

相关推荐

最近更新

标签

JAXB - 解组 OutOfMemory：Java 堆空间

提问by TyC

回答by bdoughan

回答by Dave Newton

回答by Lolke Dijkstra

回答by JB Nizet

回答by JustTry

回答by AdrianS

相关推荐

java 我使用的是哪个版本的 XPATH 和 XSLT？

返回节点列表 Java-Parent 中的树可以有多个子节点

java NFC标签可以被RFID阅读器读取吗？

java 执行附加逻辑的 getter 和 setter

相关推荐

最近更新

标签