java Spring JPA:saveandflush 与 save 的成本是多少?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43883786/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Spring JPA: What is the cost of saveandflush vs save?
提问by skyman
I have an application built from a set of microservices. One service receives data, persists it via Spring JPA and Eclipse link and then sends an alert (AMQP) to a second service.
我有一个由一组微服务构建的应用程序。一个服务接收数据,通过 Spring JPA 和 Eclipse 链接将其持久化,然后向第二个服务发送警报 (AMQP)。
Based on specific conditions, the second service then calls a RESTfull web service against the persisted data to retrieve the saved information.
根据特定条件,第二个服务然后针对持久化数据调用 RESTfull Web 服务以检索保存的信息。
I have noticed that sometimes the RESTfull service returns a null data set even though the data has been previously saved. Looking at the code for the persisting service, savehas been used instead of saveandflushso I assume that data is not being flushed fast enough for the downstream service to query.
我注意到有时 RESTfull 服务会返回一个空数据集,即使之前已经保存了数据。查看持久化服务的代码,使用了save而不是saveandflush,所以我假设数据刷新的速度不够快,无法让下游服务查询。
- Is there are cost with saveandflush that I should be weary of or should is it reasonable to use it by default?
- Would it ensure immediacy of data availability to downstream applications?
- 我应该厌倦 saveandflush 的开销,或者默认使用它是否合理?
- 它会确保数据对下游应用程序的即时可用性吗?
I should say that the original persistence function is wrapped in @Transactional
应该说原来的持久化函数被包裹在 @Transactional
回答by Edwin Dalorzo
Possible Prognosis of the Problem
问题的可能预测
I believe the issue here has nothing to do with save
vs saveAndFlush
. The problem seems related to the nature of Spring @Transactional
methods, and a wrongful use of these transactions within a distributed environment that involves both your database and a AMQP broker; and perhaps, add to that poisonous mix, some basic misunderstandings of how JPA context works.
我相信这里的问题与save
vs无关saveAndFlush
。该问题似乎与 Spring@Transactional
方法的性质以及在涉及数据库和 AMQP 代理的分布式环境中错误使用这些事务有关;也许,除了这种有害的组合之外,还有一些对 JPA 上下文如何工作的基本误解。
In your explanation, you seem to imply that you start your JPA transaction within a @Transactional
method, and during the transaction (but before it has committed) you send messages to an AMQP broker; and later, at the other side of the queue, a consumer application gets the messages and makes a REST service invocation. At this point point you notice that the transactional changes from the publisher side have not yet been committed to the database and therefore are not visible to the consumer side.
在您的解释中,您似乎暗示您在一个@Transactional
方法中启动 JPA 事务,并且在事务期间(但在提交之前)您将消息发送到 AMQP 代理;然后,在队列的另一端,消费者应用程序获取消息并进行 REST 服务调用。此时,您会注意到来自发布方的事务更改尚未提交到数据库,因此对消费者方不可见。
The problem seems to be that you propagate those AMQP messages within your JPA transaction before it has committed to disk. By the time the consumer reads a message and process it, your transaction from the publishing side may not be finished yet. So those changes are not visible to the consumer application.
问题似乎是您在 JPA 事务提交到磁盘之前在 JPA 事务中传播这些 AMQP 消息。到消费者读取消息并对其进行处理时,您从发布方的交易可能尚未完成。因此,消费者应用程序看不到这些更改。
If your AMPQ implementation is Rabbit, then I have seen this problem before: when you start a @Transactional
method that uses a database transaction manager, and within that method you use a RabbitTemplate
to send a corresponding message.
如果您的 AMPQ 实现是 Rabbit,那么我之前已经看到过这个问题:当您启动一个@Transactional
使用数据库事务管理器的方法时,您在该方法中使用 aRabbitTemplate
发送相应的消息。
If your RabbitTemplate
is not using a transacted channel (i.e. channelTransacted=true
), then your message is delivered before the database transaction has committed. I believe that by enabling transacted channels (disabled by default) in your RabbitTemplate
you solve part of the problem.
如果您RabbitTemplate
没有使用事务通道(即channelTransacted=true
),那么您的消息将在数据库事务提交之前传递。我相信通过启用交易渠道(默认情况下禁用),您RabbitTemplate
可以解决部分问题。
<rabbit:template id="rabbitTemplate"
connection-factory="connectionFactory"
channel-transacted="true"/>
When the channel is transacted, then the RabbitTemplate
"joins" the current database transaction (which apparently is a JPA transaction). Once your JPA transaction commits, it runs some epilogue code that also commits the changes in your Rabbit channel, which forces the actual "sending" of the message.
当通道被处理时,然后RabbitTemplate
“加入”当前数据库事务(这显然是一个 JPA 事务)。一旦您的 JPA 事务提交,它就会运行一些尾声代码,这些代码也会提交您的 Rabbit 频道中的更改,这会强制实际“发送”消息。
About save vs saveAndFlush
关于 save 与 saveAndFlush
You might think that flushing the changes in your JPA context should have solved the problem, but you'd be wrong. Flushing your JPA context just forces the changes in your entities (at that point just in memory) to be written to disk, but they are still written to disk within a corresponding database transaction, which won't commit until your JPA transaction commits. That happens at the end of your @Transactional
method (and unfortunately some time after you had already sent your AMQP messages — if you don't use a transacted channel as explained above).
您可能认为刷新 JPA 上下文中的更改应该可以解决问题,但您错了。刷新您的 JPA 上下文只会强制将实体中的更改(此时仅在内存中)写入磁盘,但它们仍会在相应的数据库事务中写入磁盘,在您的 JPA 事务提交之前不会提交。这发生在你的@Transactional
方法结束时(不幸的是,在你已经发送了 AMQP 消息之后的一段时间 - 如果你不使用如上所述的交易通道)。
So, even if you flush your JPA context, your consumer applicatipn won't see those changes (as per classical database isolation level rules) until your @Transactional
method has finished in your publisher application.
因此,即使您刷新 JPA 上下文,您的消费者应用程序也不会看到这些更改(根据经典数据库隔离级别规则),直到您的@Transactional
方法在发布者应用程序中完成。
When you invoke save(entity)
the EntityManager
needs not to synchronize any changes right away. Most JPA implementations just mark the entities as dirty in memory, and wait until the last minute to synchronize all changes with the database and commit those changes at database level.
当你调用save(entity)
的EntityManager
需求没有任何的变化同步的时候了。大多数 JPA 实现只是将实体标记为内存中的脏,并等到最后一分钟将所有更改与数据库同步并在数据库级别提交这些更改。
Note: there are cases in which you may want some of those changes to go down to disk right away and not until the whimsical EntityManager
decides to do so. A classical example of this happens when there is a trigger in a database table that you need it to run to generate some additional records that you will need later during your transaction. So you force a flush of the changes to disk such that the trigger is forced to run.
注意:在某些情况下,您可能希望其中一些更改立即下放到磁盘,直到异想天开的人EntityManager
决定这样做。一个典型的例子是,当数据库表中有一个触发器,您需要它运行它以生成一些您稍后在事务期间需要的附加记录时,就会发生这种情况。因此,您强制将更改刷新到磁盘,从而强制运行触发器。
By flushing the context, you're simply forcing a synchronization of changes in memory to disk, but this does not imply an instant database commit of those modifications. Hence, those changes you flush won't be necessarily visible to other transactions. Most likely they won't, based on traditional database isolation levels.
通过刷新上下文,您只是强制将内存中的更改同步到磁盘,但这并不意味着这些修改的即时数据库提交。因此,您刷新的那些更改不一定对其他事务可见。基于传统的数据库隔离级别,他们很可能不会。
The 2PC Problem
2PC 问题
Another classical problem here is that your database and your AMQP broker are two independent systems. If this is about Rabbit, then you don't have a 2PC (two-phase commit).
这里的另一个经典问题是您的数据库和您的 AMQP 代理是两个独立的系统。如果这是关于 Rabbit,那么您就没有 2PC(两阶段提交)。
So you may want to account for interesting scenarios, e.g. your database transaction successfully commits, but then Rabbit fails to commit your message, in whose case you will have to repeat the entire transaction, possibly skipping the database side effects and just re-attempting to send the message to Rabbit.
因此,您可能需要考虑一些有趣的情况,例如您的数据库事务成功提交,但是 Rabbit 未能提交您的消息,在这种情况下,您将不得不重复整个事务,可能会跳过数据库副作用并重新尝试将消息发送给兔子。
You should probably read this article on Distributed transactions in Spring, with and without XA, particularly the section on chain transactions is helpful to address this problem.
您可能应该阅读有关Spring 中分布式事务的这篇文章,使用和不使用 XA,特别是链事务部分有助于解决这个问题。
They suggest a more complex transaction manager definition. For example:
他们提出了更复杂的事务管理器定义。例如:
<bean id="jdbcTransactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
<property name="dataSource" ref="dataSource"/>
</bean>
<bean id="rabbitTransactionManager" class="org.springframework.amqp.rabbit.transaction.RabbitTransactionManager">
<property name="connectionFactory" ref="connectionFactory"/>
</bean>
<bean id="chainedTransactionManager" class="org.springframework.data.transaction.ChainedTransactionManager">
<constructor-arg name="transactionManagers">
<array>
<ref bean="rabbitTransactionManager"/>
<ref bean="jdbcTransactionManager"/>
</array>
</constructor-arg>
</bean>
And then, in your code, you just use that chained transaction manager to coordinate both, your database transactional part, and your Rabbit transactional part.
然后,在您的代码中,您只需使用该链式事务管理器来协调您的数据库事务部分和 Rabbit 事务部分。
Now, there is still the potential that you commit your database part, but that your Rabbit transaction part fails.
现在,您仍有可能提交数据库部分,但 Rabbit 事务部分会失败。
So, imagine something like this:
所以,想象一下这样的事情:
@Retry
@Transactional("chainedTransactionManager")
public void myServiceOperation() {
if(workNotDone()) {
doDatabaseTransactionWork();
}
sendMessagesToRabbit();
}
In this manner, if your Rabbit transactional part failed for any reason, and you were forced to retry the entire chained transaction, you would avoid repeating the database side effects and simply make sure to send the failed message to Rabbit.
以这种方式,如果您的 Rabbit 事务部分由于任何原因失败,并且您被迫重试整个链式事务,您将避免重复数据库副作用,只需确保将失败的消息发送给 Rabbit。
At the same time, if your database part fails, then you never sent the message to Rabbit and there would be no problems.
同时,如果你的数据库部分出现故障,那么你永远不会向Rabbit发送消息,也不会有问题。
Alternatively, if your database side effects are idempotent, then you can skip the check, just reapply the database changes and just re-attempt to send the message to rabbit.
或者,如果您的数据库副作用是幂等的,那么您可以跳过检查,只需重新应用数据库更改并重新尝试将消息发送给 rabbit。
The truth is that initially what you're trying to do seems deceivingly easy, but once you delve into the different problems and understand them you realize it is a tricky business to do this the right way.
事实是,最初您尝试做的事情看似简单,但一旦您深入研究了不同的问题并理解了它们,您就会意识到以正确的方式做这件事是一件棘手的事情。