java Spring Data JPA:嵌套实体的批量插入

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35791383/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 00:35:53  来源:igfitidea点击:

Spring Data JPA: Batch insert for nested entities

javahibernatespring-data-jpa

提问by Ahatius

I have a test case where I need to persist 100'000 entity instances into the database. The code I'm currently using does this, but it takes up to 40 seconds until all the data is persisted in the database. The data is read from a JSON file which is about 15 MB in size.

我有一个测试用例,我需要将 100'000 个实体实例持久化到数据库中。我目前使用的代码就是这样做的,但在所有数据都保存在数据库中之前最多需要 40 秒。数据是从大约 15 MB 大小的 JSON 文件中读取的。

Now I had already implemented a batch insert method in a custom repository before for another project. However, in that case I had a lot of top level entities to persist, with only a few nested entities.

现在我已经在自定义存储库中为另一个项目实现了批量插入方法。但是,在那种情况下,我有很多顶级实体要持久化,只有几个嵌套实体。

In my current case I have 5 Jobentities that contain a List of about ~30 JobDetailentities. One JobDetailcontains between 850 and 1100 JobEnvelopeentities.

在我目前的情况下,我有 5 个Job实体,其中包含大约 30 个JobDetail实体的列表。一个JobDetail包含 850 到 1100 个JobEnvelope实体。

When writing to the database I commit the List of Jobentities with the default save(Iterable<Job> jobs)interface method. All nested entities have the CascadeType PERSIST. Each entity has it's own table.

写入数据库时​​,我Job使用默认save(Iterable<Job> jobs)接口方法提交实体列表。所有嵌套实体都具有 CascadeType PERSIST。每个实体都有自己的表。

The usual way to enable batch inserts would be to implement a custom method like saveBatchthat flushes every once in a while. But my problem in this case are the JobEnvelopeentities. I don't persist them with a JobEnveloperepository, instead I let the repository of the Jobentity handle it. I'm using MariaDB as database server.

启用批量插入的常用方法是实现一个自定义方法,例如saveBatch每隔一段时间刷新一次。但在这种情况下我的问题是JobEnvelope实体。我不使用JobEnvelope存储库持久化它们,而是让Job实体的存储库处理它。我使用 MariaDB 作为数据库服务器。

So my question boils down to the following: How can I make the JobRepositoryinsert it's nested entities in batches?

所以我的问题归结为以下几点:如何JobRepository批量插入它的嵌套实体?

These are my 3 entites in question:

这些是我的 3 个有问题的实体:

Job

工作

@Entity
public class Job {
  @Id
  @GeneratedValue
  private int jobId;

  @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "job")
  @JsonManagedReference
  private Collection<JobDetail> jobDetails;
}

JobDetail

职位详情

@Entity
public class JobDetail {
  @Id
  @GeneratedValue
  private int jobDetailId;

  @ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST)
  @JoinColumn(name = "jobId")
  @JsonBackReference
  private Job job;

  @OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST, mappedBy = "jobDetail")
  @JsonManagedReference
  private List<JobEnvelope> jobEnvelopes;
}

JobEnvelope

工作信封

@Entity
public class JobEnvelope {
  @Id
  @GeneratedValue
  private int jobEnvelopeId;

  @ManyToOne(fetch = FetchType.EAGER, cascade = CascadeType.PERSIST)
  @JoinColumn(name = "jobDetailId")
  private JobDetail jobDetail;

  private double weight;
}

采纳答案by Dragan Bozanovic

Make sure to configure Hibernate batch-related properties properly:

确保正确配置 Hibernate 批处理相关属性:

<property name="hibernate.jdbc.batch_size">100</property>
<property name="hibernate.order_inserts">true</property>
<property name="hibernate.order_updates">true</property>

The point is that successive statements can be batched if they manipulate the same table. If there comes the statement doing insert to another table, the previous batch construction must be interrupted and executed before that statement. With the hibernate.order_insertsproperty you are giving permission to Hibernate to reorder inserts before constructing batch statements (hibernate.order_updateshas the same effect for update statements).

关键是如果连续的语句操作同一个表,它们就可以被批处理。如果有插入到另一个表的语句,则必须在该语句之前中断并执行先前的批处理构建。使用该hibernate.order_inserts属性,您将允许 Hibernate 在构造批处理语句之前对插入重新排序(hibernate.order_updates对更新语句具有相同的效果)。

jdbc.batch_sizeis the maximum batch size that Hibernate will use. Try and analyze different values and pick one that shows best performance in your use cases.

jdbc.batch_size是 Hibernate 将使用的最大批量大小。尝试分析不同的值,然后选择一个在您的用例中显示最佳性能的值。

Note that batching of insert statements is disabledif IDENTITYid generator is used.

请注意,如果使用id 生成器,则禁用插入语句的批处理IDENTITY

Specific to MySQL, you have to specify rewriteBatchedStatements=trueas part of the connection URL. To make sure that batching is working as expected, add profileSQL=trueto inspect the SQL the driver sends to the database. More details here.

特定于 MySQL,您必须指定rewriteBatchedStatements=true为连接 URL 的一部分。要确保批处理按预期工作,请添加profileSQL=true以检查驱动程序发送到数据库的 SQL。更多细节在这里

If your entities are versioned (for optimistic locking purposes), then in order to utilize batch updates (doesn't impact inserts) you will have to turn on also:

如果您的实体是版本化的(出于乐观锁定目的),那么为了利用批量更新(不影响插入),您还必须打开:

<property name="hibernate.jdbc.batch_versioned_data">true</property>

With this property you tell Hibernate that the JDBC driver is capable to return the correct count of affected rows when executing batch update (needed to perform the version check). You have to check whether this works properly for your database/jdbc driver. For example, it does not workin Oracle 11 and older Oracle versions.

使用此属性,您可以告诉 Hibernate JDBC 驱动程序能够在执行批量更新(需要执行版本检查)时返回受影响行的正确计数。您必须检查这是否适用于您的数据库/jdbc 驱动程序。例如,它在 Oracle 11 和更旧的 Oracle 版本中不起作用

You may also want to flush and clear the persistence context after each batchto release memory, otherwise all of the managed objects remain in the persistence context until it is closed.

您可能还希望在每个批处理之后刷新和清除持久性上下文以释放内存,否则所有托管对象都将保留在持久性上下文中,直到它关闭。

Also, you may find this bloguseful as it nicely explains the details of Hibernate batching mechanism.

此外,您可能会发现这个博客很有用,因为它很好地解释了 Hibernate 批处理机制的细节。