java Spring Data JPA - 并发批量插入/更新
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36356473/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Spring Data JPA - concurrent Bulk inserts/updates
提问by JuHarm89
at the moment I develop a Spring Boot application which mainly pulls product review data from a message queue (~5 concurrent consumer) and stores them to a MySQL DB. Each review can be uniquely identified by its reviewIdentifier (String), which is the primary key and can belong to one or more product (e.g. products with different colors). Here is an excerpt of the data-model:
目前我开发了一个 Spring Boot 应用程序,它主要从消息队列(约 5 个并发消费者)中提取产品评论数据并将它们存储到 MySQL 数据库中。每条评论都可以通过它的 reviewIdentifier(字符串)来唯一标识,它是主键,可以属于一个或多个产品(例如具有不同颜色的产品)。这是数据模型的摘录:
public class ProductPlacement implements Serializable{
private static final long serialVersionUID = 1L;
@Id
@GeneratedValue(strategy = GenerationType.AUTO)
@Column(name = "product_placement_id")
private long id;
@ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy="productPlacements")
private Set<CustomerReview> customerReviews;
}
public class CustomerReview implements Serializable{
private static final long serialVersionUID = 1L;
@Id
@Column(name = "customer_review_id")
private String reviewIdentifier;
@ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL)
@JoinTable(
name = "tb_miner_review_to_product",
joinColumns = @JoinColumn(name = "customer_review_id"),
inverseJoinColumns = @JoinColumn(name = "product_placement_id")
)
private Set<ProductPlacement> productPlacements;
}
One message from the queue contains 1 - 15 reviews and a productPlacementId. Now I want an efficient method to persist the reviews for the product. There are basically two cases which need to be considered for each incomming review:
队列中的一条消息包含 1 - 15 条评论和一个 productPlacementId。现在我想要一种有效的方法来保留产品的评论。每个传入的基本上需要考虑两种情况:
- The review is not in the database -> insert review with reference to the product that is contained in the message
- The review is already in the database -> just add the product reference to the Set productPlacements of the existing review.
- 评论不在数据库中 -> 参考消息中包含的产品插入评论
- 评论已经在数据库中 -> 只需将产品引用添加到现有评论的 Set productPlacements 中。
Currently my method for persisting the reviews is not optimal. It looks as follows (uses Spring Data JpaRespoitories):
目前,我保留评论的方法不是最佳的。它看起来如下(使用 Spring Data JpaRespoitories):
@Override
@Transactional
public void saveAllReviews(List<CustomerReview> customerReviews, long productPlacementId) {
ProductPlacement placement = productPlacementRepository.findOne(productPlacementId);
for(CustomerReview review: customerReviews){
CustomerReview cr = customerReviewRepository.findOne(review.getReviewIdentifier());
if (cr!=null){
cr.getProductPlacements().add(placement);
customerReviewRepository.saveAndFlush(cr);
}
else{
Set<ProductPlacement> productPlacements = new HashSet<>();
productPlacements.add(placement);
review.setProductPlacements(productPlacements);
cr = review;
customerReviewRepository.saveAndFlush(cr);
}
}
}
Questions:
问题:
- I sometimes get constraintViolationExceptions because of violating the unique constraint on the "reviewIndentifier". This is obviously because I (concurrently) look if the review is already present and than insert or update it. How can I avoid that?
- Is it better to use save() or saveAndFlush() in my case. I get ~50-80 reviews per secound. Will hibernate flush automatically if I just use save() or will it result in greatly increased memory usage?
- 由于违反了对“reviewIndentifier”的唯一约束,我有时会收到constraintViolationExceptions。这显然是因为我(同时)查看评论是否已经存在,而不是插入或更新它。我怎样才能避免这种情况?
- 在我的情况下,使用 save() 还是 saveAndFlush() 更好。我每秒收到约 50-80 条评论。如果我只使用 save() 会自动刷新休眠还是会导致内存使用量大大增加?
Update to question 1:Would a simple @Lock on my Review-Repository prefent the unique-constraint exception?
问题 1 的更新:我的 Review-Repository 上的简单 @Lock 是否会出现唯一约束异常?
@Lock(LockModeType.PESSIMISTIC_WRITE)
CustomerReview findByReviewIdentifier(String reviewIdentifier);
What happens when the findByReviewIdentifier returns null? Can hibernate lock the reviewIdentifier for a potential insert even if the method returns null?
当 findByReviewIdentifier 返回 null 时会发生什么?即使该方法返回 null,hibernate 也可以锁定潜在插入的 reviewIdentifier 吗?
Thank you!
谢谢!
回答by Madhusudana Reddy Sunnapu
From a performance point of view, I will consider evaluating the solution with the following changes.
从性能的角度来看,我将考虑评估具有以下更改的解决方案。
- Changing from bidirectional ManyToMany to bidirectional OneToMany
- 从双向多对多更改为双向单对多
I had a same question on which one is more efficient from DML statements that gets executed. Quoting from Typical ManyToMany mapping versus two OneToMany.
我有一个同样的问题,从执行的 DML 语句中哪个更有效。引用典型的 ManyToMany 映射与两个 OneToMany。
The option one might be simpler from a configuration perspective, but it yields less efficient DML statements.
从配置的角度来看,选项一可能更简单,但它产生的 DML 语句效率较低。
Use the second option because whenever the associations are controlled by @ManyToOne associations, the DML statements are always the most efficient ones.
使用第二个选项,因为每当关联由 @ManyToOne 关联控制时,DML 语句总是最有效的。
- Enable the batching of DML statements
- 启用 DML 语句的批处理
Enabling the batching support would result in less number of round trips to the database to insert/update the same number of records.
启用批处理支持将减少到数据库的往返次数以插入/更新相同数量的记录。
Quoting from batch INSERT and UPDATE statements
hibernate.jdbc.batch_size = 50
hibernate.order_inserts = true
hibernate.order_updates = true
hibernate.jdbc.batch_versioned_data = true
hibernate.jdbc.batch_size = 50
hibernate.order_inserts = true
hibernate.order_updates = true
hibernate.jdbc.batch_versioned_data = true
- Remove the number of saveAndFlush calls
- 删除 saveAndFlush 调用的次数
The current code gets the ProductPlacement
and for each review
it does a saveAndFlush
, which results in no batching of DML statements.
当前代码获取ProductPlacement
和 为每个review
它做一个saveAndFlush
,这导致没有批处理 DML 语句。
Instead I would consider loading the ProductPlacement
entity and adding the List<CustomerReview> customerReviews
to the Set<CustomerReview> customerReviews
field of ProductPlacement
entity and finally call the merge
method once at the end, with these two changes:
相反,我会考虑加载ProductPlacement
实体并将其添加List<CustomerReview> customerReviews
到实体的Set<CustomerReview> customerReviews
字段中,ProductPlacement
并最终在最后调用merge
一次该方法,并进行以下两个更改:
- Making
ProductPlacement
entity owner of the association i.e., by movingmappedBy
attribute ontoSet<ProductPlacement> productPlacements
field ofCustomerReview
entity. - Making
CustomerReview
entity implementequals
andhashCode
method by usingreviewIdentifier
field in these method. I believereviewIdentifier
is unique and user assigned.
- 使
ProductPlacement
实体拥有关联,即通过将mappedBy
属性移动到实体Set<ProductPlacement> productPlacements
字段CustomerReview
。 - 通过在这些方法中使用字段来制作
CustomerReview
实体实现equals
和方法。我相信是独一无二的并且是用户分配的。hashCode
reviewIdentifier
reviewIdentifier
Finally, as you do performance tuning with these changes, baseline your performance with the current code. Then make the changes and compare if the changes are really resulting in the any significant performance improvement for your solution.
最后,当您使用这些更改进行性能调优时,请使用当前代码设定性能基准。然后进行更改并比较这些更改是否真的为您的解决方案带来了任何显着的性能改进。