java Spring Data JPA - 并发批量插入/更新

Question

提问by JuHarm89

at the moment I develop a Spring Boot application which mainly pulls product review data from a message queue (~5 concurrent consumer) and stores them to a MySQL DB. Each review can be uniquely identified by its reviewIdentifier (String), which is the primary key and can belong to one or more product (e.g. products with different colors). Here is an excerpt of the data-model:

目前我开发了一个 Spring Boot 应用程序，它主要从消息队列（约 5 个并发消费者）中提取产品评论数据并将它们存储到 MySQL 数据库中。每条评论都可以通过它的 reviewIdentifier（字符串）来唯一标识，它是主键，可以属于一个或多个产品（例如具有不同颜色的产品）。这是数据模型的摘录：

public class ProductPlacement implements Serializable{

   private static final long serialVersionUID = 1L;

   @Id
   @GeneratedValue(strategy = GenerationType.AUTO)
   @Column(name = "product_placement_id")
   private long id;

   @ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy="productPlacements")
   private Set<CustomerReview> customerReviews;
}

public class CustomerReview implements Serializable{

   private static final long serialVersionUID = 1L;

   @Id
   @Column(name = "customer_review_id")
   private String reviewIdentifier;

   @ManyToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL)
   @JoinTable(
        name = "tb_miner_review_to_product",
           joinColumns = @JoinColumn(name = "customer_review_id"),
           inverseJoinColumns = @JoinColumn(name = "product_placement_id")
        )
   private Set<ProductPlacement> productPlacements;
}

One message from the queue contains 1 - 15 reviews and a productPlacementId. Now I want an efficient method to persist the reviews for the product. There are basically two cases which need to be considered for each incomming review:

队列中的一条消息包含 1 - 15 条评论和一个 productPlacementId。现在我想要一种有效的方法来保留产品的评论。每个传入的基本上需要考虑两种情况：

The review is not in the database -> insert review with reference to the product that is contained in the message
The review is already in the database -> just add the product reference to the Set productPlacements of the existing review.

评论不在数据库中 -> 参考消息中包含的产品插入评论
评论已经在数据库中 -> 只需将产品引用添加到现有评论的 Set productPlacements 中。

Currently my method for persisting the reviews is not optimal. It looks as follows (uses Spring Data JpaRespoitories):

目前，我保留评论的方法不是最佳的。它看起来如下（使用 Spring Data JpaRespoitories）：

@Override
@Transactional
public void saveAllReviews(List<CustomerReview> customerReviews, long productPlacementId) {
    ProductPlacement placement = productPlacementRepository.findOne(productPlacementId);
    for(CustomerReview review: customerReviews){
        CustomerReview cr = customerReviewRepository.findOne(review.getReviewIdentifier());
        if (cr!=null){
            cr.getProductPlacements().add(placement);
            customerReviewRepository.saveAndFlush(cr);
        }   
        else{
            Set<ProductPlacement> productPlacements = new HashSet<>();
            productPlacements.add(placement);
            review.setProductPlacements(productPlacements);
            cr = review;
            customerReviewRepository.saveAndFlush(cr);
        }

    }
}

Questions:

问题：

I sometimes get constraintViolationExceptions because of violating the unique constraint on the "reviewIndentifier". This is obviously because I (concurrently) look if the review is already present and than insert or update it. How can I avoid that?
Is it better to use save() or saveAndFlush() in my case. I get ~50-80 reviews per secound. Will hibernate flush automatically if I just use save() or will it result in greatly increased memory usage?

由于违反了对“reviewIndentifier”的唯一约束，我有时会收到constraintViolationExceptions。这显然是因为我（同时）查看评论是否已经存在，而不是插入或更新它。我怎样才能避免这种情况？
在我的情况下，使用 save() 还是 saveAndFlush() 更好。我每秒收到约 50-80 条评论。如果我只使用 save() 会自动刷新休眠还是会导致内存使用量大大增加？

Update to question 1:Would a simple @Lock on my Review-Repository prefent the unique-constraint exception?

问题 1 的更新：我的 Review-Repository 上的简单 @Lock 是否会出现唯一约束异常？

@Lock(LockModeType.PESSIMISTIC_WRITE)
CustomerReview findByReviewIdentifier(String reviewIdentifier);

What happens when the findByReviewIdentifier returns null? Can hibernate lock the reviewIdentifier for a potential insert even if the method returns null?

当 findByReviewIdentifier 返回 null 时会发生什么？即使该方法返回 null，hibernate 也可以锁定潜在插入的 reviewIdentifier 吗？

Thank you!

谢谢！

Answer 1

回答by Madhusudana Reddy Sunnapu

From a performance point of view, I will consider evaluating the solution with the following changes.

从性能的角度来看，我将考虑评估具有以下更改的解决方案。

Changing from bidirectional ManyToMany to bidirectional OneToMany

从双向多对多更改为双向单对多

I had a same question on which one is more efficient from DML statements that gets executed. Quoting from Typical ManyToMany mapping versus two OneToMany.

我有一个同样的问题，从执行的 DML 语句中哪个更有效。引用典型的 ManyToMany 映射与两个 OneToMany。

The option one might be simpler from a configuration perspective, but it yields less efficient DML statements.

从配置的角度来看，选项一可能更简单，但它产生的 DML 语句效率较低。

Use the second option because whenever the associations are controlled by @ManyToOne associations, the DML statements are always the most efficient ones.

使用第二个选项，因为每当关联由 @ManyToOne 关联控制时，DML 语句总是最有效的。

Enable the batching of DML statements

启用 DML 语句的批处理

Enabling the batching support would result in less number of round trips to the database to insert/update the same number of records.

启用批处理支持将减少到数据库的往返次数以插入/更新相同数量的记录。

Quoting from batch INSERT and UPDATE statements

从批处理 INSERT 和 UPDATE 语句中引用

hibernate.jdbc.batch_size = 50
hibernate.order_inserts = true
hibernate.order_updates = true
hibernate.jdbc.batch_versioned_data = true

Remove the number of saveAndFlush calls

删除 saveAndFlush 调用的次数

The current code gets the ProductPlacementand for each reviewit does a saveAndFlush, which results in no batching of DML statements.

当前代码获取ProductPlacement和为每个review它做一个saveAndFlush，这导致没有批处理 DML 语句。

Instead I would consider loading the ProductPlacemententity and adding the List<CustomerReview> customerReviewsto the Set<CustomerReview> customerReviewsfield of ProductPlacemententity and finally call the mergemethod once at the end, with these two changes:

相反，我会考虑加载ProductPlacement实体并将其添加List<CustomerReview> customerReviews到实体的Set<CustomerReview> customerReviews字段中，ProductPlacement并最终在最后调用merge一次该方法，并进行以下两个更改：

Making ProductPlacemententity owner of the association i.e., by moving mappedByattribute onto Set<ProductPlacement> productPlacementsfield of CustomerReviewentity.
Making CustomerReviewentity implement equalsand hashCodemethod by using reviewIdentifierfield in these method. I believe reviewIdentifieris unique and user assigned.

使ProductPlacement实体拥有关联，即通过将mappedBy属性移动到实体Set<ProductPlacement> productPlacements字段CustomerReview。
通过在这些方法中使用字段来制作CustomerReview实体实现equals和方法。我相信是独一无二的并且是用户分配的。hashCodereviewIdentifierreviewIdentifier

Finally, as you do performance tuning with these changes, baseline your performance with the current code. Then make the changes and compare if the changes are really resulting in the any significant performance improvement for your solution.

最后，当您使用这些更改进行性能调优时，请使用当前代码设定性能基准。然后进行更改并比较这些更改是否真的为您的解决方案带来了任何显着的性能改进。

java Spring Data JPA - 并发批量插入/更新

提问by JuHarm89

回答by Madhusudana Reddy Sunnapu

相关推荐

最近更新

标签

java Spring Data JPA - 并发批量插入/更新

提问by JuHarm89

回答by Madhusudana Reddy Sunnapu

相关推荐

使用 Java 将 JSON 转换为 XML

java Weblogic 12c：Jersey 的 Prefer-web-inf-classes 和 preferred-application-packages

java NullPointerException：尝试在空对象引用上调用虚拟方法“void android.widget.CheckBox.setOnClickListener，仅在平板电脑上”

java 特定索引的Java流过滤项

相关推荐

最近更新

标签