Java 大量插入 JPA + Hibernate
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20285347/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Massive insert with JPA + Hibernate
提问by Rafael Afonso
I need to do a massive insert using EJB 3, Hibernate, Spring Data and Oracle. Originally, I am using Spring Data and code is below:
我需要使用 EJB 3、Hibernate、Spring Data 和 Oracle 进行大量插入。最初,我使用的是 Spring Data,代码如下:
talaoAITDAO.save(taloes);
Where talaoAITDAO is a Spring Data JpaRepositorysubclass and taloes is a Collection of TalaoAIT entity. In this entity, Its respective ID has this form:
其中 talaoAITDAO 是 Spring Data JpaRepository子类,而 talaoes 是 TalaoAIT 实体的集合。在该实体中,其各自的 ID 具有以下形式:
@Id
@Column(name = "ID_TALAO_AIT")
@SequenceGenerator(name = "SQ_TALAO_AIT", sequenceName = "SQ_TALAO_AIT", allocationSize = 1000)
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "SQ_TALAO_AIT")
private Long id;
Also this entity has no related entities to do cascade insert.
此实体也没有相关实体进行级联插入。
My problem here, is that all entities are individually inserted (such as INSERT INTO TABLE(col1, col2) VALUES (val1, val2)
). Occasionally, it can cause a timeout and all insertions will be rolled back. I would want convert these individual inserts in batch inserts (such as INSERT INTO TABLE(col1, col2) VALUES (val11, val12), (val21, val22), (val31, val32), ...
).
我的问题是,所有实体都是单独插入的(例如INSERT INTO TABLE(col1, col2) VALUES (val1, val2)
)。有时,它会导致超时,所有插入都将回滚。我想在批量插入(例如INSERT INTO TABLE(col1, col2) VALUES (val11, val12), (val21, val22), (val31, val32), ...
)中转换这些单独的插入。
Studying alternatives to improve performance, I found this pagein hibernate documentation, beyond Hibernate batch size confusionand this other page. Based on them, I wrote this code:
研究提高性能的替代方法,我在 hibernate 文档中找到了这个页面,除了 Hibernate 批量大小混淆和其他页面。基于它们,我编写了以下代码:
Session session = super.getEntityManager().unwrap(Session.class);
int batchSize = 1000;
for (int i = 0; i < taloes.size(); i++) {
TalaoAIT talaoAIT = taloes.get(i);
session.save(talaoAIT);
if(i % batchSize == 0) {
session.flush();
session.clear();
}
taloes.add(talaoAIT);
}
session.flush();
session.clear();
Also, in peristence.xml, I added these properties:
另外,在 peristence.xml 中,我添加了以下属性:
<property name="hibernate.jdbc.batch_size" value="1000" />
<property name="order_inserts" value="true" />
However, although in my tests I had perceived a subtle difference (mainly with big collections and big batch sizes), it was not so big as desirable. In logging console, I saw that Hibernate continued to do individual inserts, not replacing them for massive insert. As in my entity, I am using a Sequence generator I believe that it is not problem (according Hibernate documentation, I would had problem if I was using Identity generator).
然而,虽然在我的测试中我发现了一个微妙的差异(主要是大集合和大批量),但它并没有那么大。在日志控制台中,我看到 Hibernate 继续执行单独的插入,而不是将它们替换为大量插入。和我的实体一样,我使用的是序列生成器,我相信这不是问题(根据 Hibernate 文档,如果我使用身份生成器,我会遇到问题)。
So, my question is what can be missing here. Some configuration? Some method not used?
所以,我的问题是这里可能缺少什么。一些配置?有些方法没用过?
Thanks,
谢谢,
Rafael Afonso.
拉斐尔·阿方索。
回答by M. Deinum
A couple of things.
几件事。
First your configuration properties are wrong order_inserts
must be hibernate.order_inserts
. Currently your setting is ignored and you haven't changed a thing.
首先你的配置属性都错了order_inserts
必须hibernate.order_inserts
。目前您的设置被忽略,您没有更改任何内容。
Next use the EntityManager
instead of doing all that nasty hibernate stuff. The EntityManager
also has a flush
and clear
method. This should at least cleanup your method. Without the order this helps a little to cleanup the session and preventing dirty-checks on all the objects in there.
接下来使用EntityManager
而不是做所有那些讨厌的休眠的东西。该EntityManager
也有一个flush
和clear
方法。这至少应该清理你的方法。如果没有顺序,这有助于清理会话并防止对那里的所有对象进行脏检查。
EntityManager em = getEntityManager();
int batchSize = 1000;
for (int i = 0; i < taloes.size(); i++) {
TalaoAIT talaoAIT = taloes.get(i);
em.persist(talaoAIT);
if(i % batchSize == 0) {
em.flush();
em.clear();
}
taloes.add(talaoAIT);
}
em.flush();
em.clear();
Next you shouldn't make your batches to large as that can cause memory problems, start with something like 50 and test which/what performs best. There is a point at which dirty-checking is going to take more time then flusing and clearing to the database. You want to find this sweet spot.
接下来你不应该让你的批次过大,因为这会导致内存问题,从 50 之类的东西开始并测试哪个/什么表现最好。在某一点上,脏检查将花费更多时间然后冲洗和清除数据库。你想找到这个甜蜜点。
回答by Jim Tough
The solution posted by M. Deinum worked great for me, provided I set the following Hibernate properties in my JPA persistence.xml
file:
M. Deinum 发布的解决方案对我很有用,前提是我在 JPApersistence.xml
文件中设置了以下 Hibernate 属性:
<property name="hibernate.jdbc.batch_size" value="50" />
<property name="hibernate.jdbc.batch_versioned_data" value="true" />
<property name="hibernate.order_inserts" value="true" />
<property name="hibernate.order_updates" value="true" />
<property name="hibernate.cache.use_second_level_cache" value="false" />
<property name="hibernate.connection.autocommit" value="false" />
I am using an Oracle database, so I also have this one defined:
我使用的是 Oracle 数据库,所以我也定义了这个:
<property name="hibernate.dialect" value="org.hibernate.dialect.Oracle10gDialect" />
回答by mm759
I recently found a promising small library for batching inserts with Hibernate and Postgresql. It is called pedal-dialectand uses the Postgresql - command COPY
which is claimed by many people to be much faster than batched inserts (references: Postgresql manual, Postgresql Insert Strategies - Performance Test, How does copy work and why is it so much faster than insert?). pedal-dialect allows to use COPY
without fully losing the ease of use of Hibernate. You still get automatic mapping of entities and rows and don't have to implement it on your own.
我最近发现了一个很有前途的小型库,用于使用 Hibernate 和 Postgresql 批量插入。它被称为踏板方言并使用 Postgresql - 命令COPY
,许多人声称该命令比批量插入快得多(参考:Postgresql 手册、Postgresql 插入策略 - 性能测试、复制如何工作以及为什么它比批量插入快得多)插入?)。踏板方言允许使用COPY
而不会完全失去 Hibernate 的易用性。您仍然可以获得实体和行的自动映射,而不必自己实现。