oracle 使用 JPA/Toplink 批量插入

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/64781/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 17:10:47  来源:igfitidea点击:

Batch insert using JPA/Toplink

javaoraclejpatoplink

提问by Zizzencs

I have a web application that receives messages through an HTTP interface, e.g.:

我有一个通过 HTTP 接口接收消息的 Web 应用程序,例如:

http://server/application?source=123&destination=234&text=hello

This request contains the ID of the sender, the ID of the recipient and the text of the message.

该请求包含发件人 ID、收件人 ID 和消息文本。

This message should be processed like:

此消息应按如下方式处理:

  • finding the matching User object for both the source and the destination from the database
  • creating a tree of objects: a Message that contains a field for the message text and two User objects for the source and the destination
  • persisting this tree to a database.
  • 从数据库中为源和目标查找匹配的 User 对象
  • 创建对象树:包含消息文本字段的消息和源和目标的两个用户对象
  • 将此树持久化到数据库中。

The tree will be loaded by other applications that I can't touch.

该树将由我无法触及的其他应用程序加载。

I use Oracle as the backing database and JPA with Toplink for the database handling tasks. If possible, I'd stay with these.

我使用 Oracle 作为后备数据库,使用带有 Toplink 的 JPA 作为数据库处理任务。如果可能,我会留下这些。

Without much optimization I can achieve ~30 requests/sec throughput in my environment. That's not much, I'd require ~300 requests/sec. So I measured where the performance bottleneck is and found that the calls to em.persist()takes most of the time. If I simply comment out that line, the throughput go well over 1000 requests/sec.

如果没有太多优化,我可以在我的环境中实现约 30 个请求/秒的吞吐量。这并不多,我需要大约 300 个请求/秒。所以我测量了性能瓶颈在哪里,发现em.persist()大部分时间都在调用。如果我简单地注释掉该行,吞吐量将远远超过 1000 个请求/秒。

I tried to write a small test application that used simple JDBC calls to persist 1 million messages to the same database. I used batching, meaning I did 100 inserts then a commit, and repeated until all the records was in the database. I measured ~500 requests/sec throughput in this scenario, that would meet my needs.

我尝试编写一个小型测试应用程序,该应用程序使用简单的 JDBC 调用将 100 万条消息持久化到同一个数据库。我使用了批处理,这意味着我做了 100 次插入然后提交,并重复直到所有记录都在数据库中。在这种情况下,我测量了大约 500 个请求/秒的吞吐量,这将满足我的需求。

It is clear that I need to optimize insert performance here. However as I mentioned earlier I would like to keep using JPA and Toplink for this, not pure JDBC.

很明显,我需要在这里优化插入性能。但是,正如我之前提到的,我想为此继续使用 JPA 和 Toplink,而不是纯 JDBC。

Do you know a way to create batch inserts with JPA and Toplink? Can you recommend any other technique for improving JPA persist performance?

您知道使用 JPA 和 Toplink 创建批量插入的方法吗?你能推荐任何其他技术来提高 JPA 持久性能吗?

ADDITIONAL INFO:

附加信息:

"requests/sec" means here: total number of requests / total time from beginning of test to last record written to database.

“请求/秒”在这里的意思是:请求总数/从测试开始到写入数据库的最后一条记录的总时间。

I tried to make the calls to em.persist()asynchronous by creating an in-memory queue between the servlet stuff and the persister. It helped the performance greatly. However the queue did grow really fast and as the application will receive ~200 requests/second continuously, It is not an acceptable solution for me.

我试图em.persist()通过在 servlet 内容和持久程序之间创建一个内存中队列来调用异步。它对表演有很大帮助。然而,队列确实增长得非常快,并且由于应用程序每秒将连续收到约 200 个请求,这对我来说不是一个可接受的解决方案。

In this decoupled approach I collected requests for 100 msec and called em.persist()on all collected items before commiting the transaction. The EntityManagerFactory is cached between each transaction.

在这种解耦方法中,我收集了 100 毫秒的请求,并em.persist()在提交事务之前调用了所有收集的项目。EntityManagerFactory 在每个事务之间缓存。

回答by Christian

You should decouple from the JPA interface and use the bare TopLink API. You can probably chuck the objects you're persisting into a UnitOfWork and commit the UnitOfWork on your schedule (sync or async). Note that one of the costs of em.persist() is the implicit clone that happens of the whole object graph. TopLink will work rather better if you uow.registerObject() your two user objects yourself, saving itself the identity tests it has to otherwise do. So you'll end up with:

您应该与 JPA 接口分离并使用裸 TopLink API。您可能可以将要持久化的对象放入 UnitOfWork 并按计划(同步或异步)提交 UnitOfWork。请注意, em.persist() 的成本之一是整个对象图发生的隐式克隆。如果您自己使用 uow.registerObject() 两个用户对象,TopLink 将工作得更好,从而节省了它必须执行的身份测试。所以你最终会得到:

uow=sess.acquireUnitOfWork();
for (job in batch) {
 thingyCl=uow.registerObject(new Thingy());
 user1Cl=uow.registerObject(user1);
 user2Cl=uow.registerObject(user2);
 thingyCl.setUsers(user1Cl,user2Cl);
}
uow.commit();

This is very old school TopLink btw ;)

顺便说一句,这是非常老派的 TopLink ;)

Note that the batch will help a lot, because batch writing and more especially batch writing with parameter binding will kick in which for this simple example will probably have a very large impact on your performance.

请注意,批处理将有很大帮助,因为批处理写入,尤其是带有参数绑定的批处理写入将启动,对于这个简单的示例,它可能会对您的性能产​​生非常大的影响。

Other things to look for: your sequencing size. A lot of the time spent writing objects in TopLink is actually spent reading sequencing information from the database, especially with the small defaults (I would probably have several hundred or even more as my sequence size).

其他需要注意的事项:您的测序大小。在 TopLink 中编写对象所花费的大量时间实际上是用于从数据库中读取排序信息,尤其是对于较小的默认值(我的序列大小可能有数百甚至更多)。

回答by davetron5000

What is your measure of "requests/sec"? In other words, what happens for the 31st request? What resource is being blocked? If it is the front-end/servlet/web portion, can you run em.persist() in another thread and return immediately?

您对“请求/秒”的衡量标准是什么?换句话说,第 31 个请求会发生什么?什么资源被阻塞?如果是前端/servlet/web部分,能不能在另一个线程中运行em.persist()立即返回?

Also, are you creating transactions each time? Are you creating EntityManagerFactory objects with each request?

另外,您每次都创建交易吗?您是否为每个请求创建 EntityManagerFactory 对象?