java 使用休眠读取大量数据时出现内存不足

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2242999/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 20:16:50  来源:igfitidea点击:

OutOfMemory when reading big amounts of data using hibernate

javahibernateormout-of-memorybatch-processing

提问by Vladimir

I need to export big amount of data from database. Here is classes that represents my data:

我需要从数据库中导出大量数据。这是代表我的数据的类:

public class Product{
...

    @OneToMany
    @JoinColumn(name = "product_id")
    @Cascade({SAVE_UPDATE, DELETE_ORPHAN})
    List<ProductHtmlSource> htmlSources = new ArrayList<ProductHtmlSource>();

... }

... }

ProductHtmlSource- contains big string inside which I actually need to export.

ProductHtmlSource- 包含我实际需要导出的大字符串。

Since size of exported data is bigger than JVM memory I'm reading my data by chunks. Like this:

由于导出数据的大小大于 JVM 内存,因此我正在按块读取数据。像这样:

final int batchSize = 1000;      
for (int i = 0; i < 50; i++) {
  ScrollableResults iterator = getProductIterator(batchSize * i, batchSize * (i + 1));
  while (iterator.getScrollableResults().next()) {
     Product product = (Product) iterator.getScrollableResults().get(0); 
     List<String> htmls = product.getHtmlSources();
     <some processing>
  }

}

}

Code of getProductIterator:

代码getProductIterator

public ScrollableResults getProductIterator(int offset, int limit) {
        Session session = getSession(true);
        session.setCacheMode(CacheMode.IGNORE);
        ScrollableResults iterator = session
                .createCriteria(Product.class)
                .add(Restrictions.eq("status", Product.Status.DONE))
                .setFirstResult(offset)
                .setMaxResults(limit)
                .scroll(ScrollMode.FORWARD_ONLY);
        session.flush();
        session.clear();

        return iterator;
    }

The problem is that in spite of I clearing session after reading of each data chunk Productobjects accumulates somewhere and I'm get OutOfMemory exception. The problem is not in processing block of code even without it I get memory error. The size of batch also is not a problem since 1000 objects easily sit into memory.

问题是,尽管我在读取每个数据块Product对象后清除了会话,但在某处累积,并且出现 OutOfMemory 异常。问题不在于处理代码块,即使没有它我也会出现内存错误。批处理的大小也不是问题,因为 1000 个对象很容易进入内存。

Profiler showed that objects accumulates in org.hibernate.engine.StatefulPersistenceContextclass.

Profiler 显示对象在org.hibernate.engine.StatefulPersistenceContext类中累积。

The stacktrace:

堆栈跟踪:

Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
    at java.lang.StringBuffer.append(StringBuffer.java:307)
    at org.hibernate.type.TextType.get(TextType.java:41)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:163)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:154)
    at org.hibernate.type.AbstractType.hydrate(AbstractType.java:81)
    at org.hibernate.persister.entity.AbstractEntityPersister.hydrate(AbstractEntityPersister.java:2101)
    at org.hibernate.loader.Loader.loadFromResultSet(Loader.java:1380)
    at org.hibernate.loader.Loader.instanceNotYetLoaded(Loader.java:1308)
    at org.hibernate.loader.Loader.getRow(Loader.java:1206)
    at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:580)
    at org.hibernate.loader.Loader.doQuery(Loader.java:701)
    at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
    at org.hibernate.loader.Loader.loadCollection(Loader.java:1994)
    at org.hibernate.loader.collection.CollectionLoader.initialize(CollectionLoader.java:36)
    at org.hibernate.persister.collection.AbstractCollectionPersister.initialize(AbstractCollectionPersister.java:565)
    at org.hibernate.event.def.DefaultInitializeCollectionEventListener.onInitializeCollection(DefaultInitializeCollectionEventListener.java:63)
    at org.hibernate.impl.SessionImpl.initializeCollection(SessionImpl.java:1716)
    at org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:344)
    at org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersistentCollection.java:86)
    at org.hibernate.collection.AbstractPersistentCollection.readSize(AbstractPersistentCollection.java:109)
    at org.hibernate.collection.PersistentBag.size(PersistentBag.java:225)
    **at com.rivalwatch.plum.model.Product.getHtmlSource(Product.java:76)
    at com.rivalwatch.plum.model.Product.getHtmlSourceText(Product.java:80)
    at com.rivalwatch.plum.readers.AbstractDataReader.getData(AbstractDataReader.java:64)**

采纳答案by KeithL

It looks like you are calling getProductIterator() with the starting and ending row numbers, while getProductIterator() is expecting the starting row and a row count. As your "upper limit" gets higher you are reading data in bigger chunks. I think you mean to pass batchSize as the second argument to getProductIterator().

看起来您正在使用起始行号和结束行号调用 getProductIterator(),而 getProductIterator() 则需要起始行和行数。随着您的“上限”越来越高,您正在以更大的块读取数据。我认为您的意思是将 batchSize 作为第二个参数传递给 getProductIterator()。

回答by Pascal Thivent

Not a direct answer but for this kind of data manipulation, I would use the StatelessSession interface.

不是直接的答案,但对于这种数据操作,我会使用StatelessSession 接口

回答by Brian Deterling

KeithL is right - you're passing an ever-increasing limit. But breaking it up that way doesn't make sense anyway. The whole point of a scroll cursor is that you process a row at a time so there's no need to break it up into chunks. The fetch size reduces the trips to the database at the cost of using up more memory. The general pattern should be:

KeithL 是对的 - 您正在通过一个不断增加的限制。但无论如何,以这种方式打破它是没有意义的。滚动游标的全部意义在于您一次处理一行,因此无需将其分成块。提取大小以使用更多内存为代价减少了到数据库的次数。一般模式应该是:

Query q = session.createCriteria(... no offset or limit ...);
q.setCacheMode(CacheMode.IGNORE); // prevent query or second level caching
q.setFetchSize(1000);  // experiment with this to optimize performance vs. memory
ScrollableResults iterator = query.scroll(ScrollMode.FORWARD_ONLY);
while (iterator.next()) {
  Product p = (Product)iterator.get();
  ...
  session.evict(p);  // required to keep objects from accumulating in the session
}

That said, the error is getHtmlSources so the problem may be completely unrelated to the session/cursor/scroll issue. If those html strings are huge and they're being referenced the entire time, you may just be running out of contiguous memory.

也就是说,错误是 getHtmlSources 因此问题可能与会话/光标/滚动问题完全无关。如果这些 html 字符串很大并且一直被引用,那么您可能只是耗尽了连续内存。

Btw, I don't see a getScrollableResults method on ScrollableResults.

顺便说一句,我在 ScrollableResults 上没有看到 getScrollableResults 方法。

回答by mcottle

At the risk of appearing stupid - have you considered doing this another way?

冒着显得愚蠢的风险 - 你有没有考虑过这样做?

Personally I would avoid doing batch processing that "far away" from the database. I don't know what database you're using but there's usually a mechanism for efficiently pulling a dataset out of the database & into a file even if it involves moderately simple manipulation on the way out. Stored procedures, specific export utilities. Investigate what else is available from your database vendor.

就我个人而言,我会避免进行“远离”数据库的批处理。我不知道你使用的是什么数据库,但通常有一种机制可以有效地将数据集从数据库中提取出来并放入文件中,即使它在退出时涉及适度简单的操作。存储过程,特定的导出实用程序。调查您的数据库供应商提供的其他产品。

回答by Padmarag

Can you post the Exception stacktrace? It may be solved by passing suitable JVM options for GC.

你能发布异常堆栈跟踪吗?它可以通过为 GC 传递合适的 JVM 选项来解决。

I think this is related - Java StringBuilder huge overhead.

我认为这是相关的 - Java StringBuilder 巨大的开销

Looks from the StackTrace that a very large String is being created and causing the exception.

从 StackTrace 看来,正在创建一个非常大的字符串并导致异常。