postgresql Postgres UPDATE with ORDER BY,怎么做?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44660368/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Postgres UPDATE with ORDER BY, how to do it?
提问by bbozo
I need to do a Postgres update on a collection of records & I'm trying to prevent a deadlock which appeared in the stress tests.
我需要对一组记录进行 Postgres 更新,我正试图防止出现在压力测试中的死锁。
The typical resolution to this is to update records in a certain order, by ID for example - but it seems that Postgres doesn't allow ORDER BY for UPDATE.
对此的典型解决方案是按特定顺序更新记录,例如按 ID - 但似乎 Postgres 不允许 ORDER BY 进行 UPDATE。
Assuming I need to do an update, for example:
假设我需要进行更新,例如:
UPDATE BALANCES WHERE ID IN (SELECT ID FROM some_function() ORDER BY ID);
results in deadlocks when you run 200 queries concurrently. What to do?
当您同时运行 200 个查询时会导致死锁。该怎么办?
I'm looking for a general solution, not case-specific workarounds like in UPDATE with ORDER BY
我正在寻找一个通用的解决方案,而不是像在UPDATE with ORDER BY 中那样针对特定情况的解决方法
It feelsthat there must be a better solution than writing a cursor function. Also, if there's no better way, how would that cursor function optimally look like? Update record-by-record
它认为,必须有比写一个光标功能更好的解决方案。另外,如果没有更好的方法,那么该游标功能的最佳外观如何?逐条更新
回答by Nick Barnes
As far as I know, there's no way to accomplish this directly through the UPDATE
statement; the only way to guarantee lock order is to explicitly acquire locks with a SELECT ... ORDER BY ID FOR UPDATE
, e.g.:
据我所知,没有办法直接通过UPDATE
语句来实现这一点;保证锁顺序的唯一方法是使用 a 显式获取锁SELECT ... ORDER BY ID FOR UPDATE
,例如:
UPDATE Balances
SET Balance = 0
WHERE ID IN (
SELECT ID FROM Balances
WHERE ID IN (SELECT ID FROM some_function())
ORDER BY ID
FOR UPDATE
)
This has the downside of repeating the ID
index lookup on the Balances
table. In your simple example, you can avoid this overhead by fetching the physical row address (represented by the ctid
system column) during the locking query, and using that to drive the UPDATE
:
这具有ID
在Balances
表上重复索引查找的缺点。在您的简单示例中,您可以通过在锁定查询期间获取物理行地址(由ctid
system column表示)来避免这种开销,并使用它来驱动UPDATE
:
UPDATE Balances
SET Balance = 0
WHERE ctid = ANY(ARRAY(
SELECT ctid FROM Balances
WHERE ID IN (SELECT ID FROM some_function())
ORDER BY ID
FOR UPDATE
))
(Be careful when using ctid
s, as the values are transient. We're safe here, as the locks will block any changes.)
(使用ctid
s时要小心,因为值是瞬态的。我们在这里是安全的,因为锁会阻止任何更改。)
Unfortunately, the planner will only utilise the ctid
in a narrow set of cases (you can tell if it's working by looking for a "Tid Scan" node in the EXPLAIN
output). To handle more complicated queries within a single UPDATE
statement, e.g. if your new balance was being returned by some_function()
alongside the ID, you'll need to fall back to the ID-based lookup:
不幸的是,规划器只会ctid
在少数情况下使用(您可以通过在EXPLAIN
输出中查找“Tid Scan”节点来判断它是否有效)。要在单个UPDATE
语句中处理更复杂的查询,例如,如果您的新余额some_function()
与 ID 一起返回,您需要回退到基于 ID 的查找:
UPDATE Balances
SET Balance = Locks.NewBalance
FROM (
SELECT Balances.ID, some_function.NewBalance
FROM Balances
JOIN some_function() ON some_function.ID = Balances.ID
ORDER BY Balances.ID
FOR UPDATE
) Locks
WHERE Balances.ID = Locks.ID
If the performance overhead is an issue, you'd need to resort to using a cursor, which would look something like this:
如果性能开销是一个问题,您需要求助于使用游标,它看起来像这样:
DO $$
DECLARE
c CURSOR FOR
SELECT Balances.ID, some_function.NewBalance
FROM Balances
JOIN some_function() ON some_function.ID = Balances.ID
ORDER BY Balances.ID
FOR UPDATE;
BEGIN
FOR row IN c LOOP
UPDATE Balances
SET Balance = row.NewBalance
WHERE CURRENT OF c;
END LOOP;
END
$$
回答by flutter
In general, concurrency is difficult. Especially with 200 statements (i'm assuming you don't only query = SELECT) or even transactions (actually every single statement issued is wrapped into a transaction if it's not in a transaction already).
一般来说,并发是困难的。尤其是有 200 条语句(我假设您不仅查询 = SELECT)甚至事务(实际上,如果它不在事务中,则发出的每个语句都被包装到事务中)。
The general solution concepts are (a combination of) these:
一般的解决方案概念是(组合)这些:
To be aware that deadlocks can happen, catch them in the application, check the Error Codesfor
class 40
or40P01
and retry the transaction.Reserve locks. Use
SELECT ... FOR UPDATE
. Evade explicit locks as long as possible. Locks will force other transactions to wait for lock release, which harms concurrency, but can prevent transactions running into deadlocks. Check the example for deadlocks in chapter 13. Especially the one in which transaction A waits for B and B waits for A (the bank account thingy).Choose a different Isolation Level, for example a weaker one like
READ COMMITED
, if possible. Be aware ofLOST UPDATE
s inREAD COMMITED
mode. Prevent them withREPEATABLE READ
.
要知道,死锁可能发生,抓住他们的应用程序,检查错误代码为
class 40
或40P01
和重试事务。预留锁。使用
SELECT ... FOR UPDATE
. 尽可能长时间地避开显式锁。锁会强制其他事务等待锁释放,这会损害并发性,但可以防止事务陷入死锁。检查第 13 章中的死锁示例。尤其是事务 A 等待 B 和 B 等待 A(银行帐户的东西)的例子。如果可能,请选择不同的隔离级别,例如较弱的隔离级别,例如
READ COMMITED
。请注意LOST UPDATE
sREAD COMMITED
模式。用 防止它们REPEATABLE READ
。
Write your statements with locks in the same order in EVERY transaction, for example by table name alphabetically.
在每个事务中以相同的顺序编写带有锁的语句,例如按表名的字母顺序。
LOCK / USE A -- Transaction 1
LOCK / USE B -- Transaction 1
LOCK / USE C -- Transaction 1
-- D not used -- Transaction 1
-- A not used -- Transaction 2
LOCK / USE B -- Transaction 2
-- C not used -- Transaction 2
LOCK / USE D -- Transaction 2
with the general locking order A B C D
. This way, the transactions can interleave in any relative order and still have a good chance not to deadlock (depending on your statements you may have other serialization issues though). The statements of the transactions will run in the order specified by them, but it can be that transaction 1 runs their first 2, then xact 2 runs the first one, then 1 finishes and finally xact 2 finishes.
跟一般的锁定顺序A B C D
。这样,事务可以以任何相对顺序交错,并且仍然有很好的机会不会死锁(根据您的语句,您可能还有其他序列化问题)。事务的语句将按照它们指定的顺序运行,但也可以是事务 1 运行它们的第一个 2,然后 xact 2 运行第一个,然后 1 完成,最后 xact 2 完成。
Also, you should realise that a statement involving multiple rows is not executed atomically in a concurrent situation. In other words, if you have two statements A and B involving multiple rows, then they can be executed in this order:
此外,您应该意识到在并发情况下不会自动执行涉及多行的语句。换句话说,如果您有两条语句 A 和 B 涉及多行,那么它们可以按以下顺序执行:
a1 b1 a2 a3 a4 b2 b3
but NOT as a block of a's followed by b's.
The same applies to a statement with a sub-query.
Have you looked at the query plans using EXPLAIN
?
但不是作为 a 的块后跟 b 的。这同样适用于带有子查询的语句。您是否使用过查看查询计划EXPLAIN
?
In your case, you can try
在你的情况下,你可以尝试
UPDATE BALANCES WHERE ID IN (
SELECT ID FROM some_function() FOR UPDATE -- LOCK using FOR UPDATE
-- other transactions will WAIT / BLOCK temporarily on conc. write access
);
If possible by what you want to do, you can also use SELECT ... FOR UPDATE SKIP LOCK, which will skip already locked data to get back concurrency, which is lost by WAITing for another transaction to release a lock (FOR UPDATE). But this will not apply an UPDATE to locked rows, which your application logic might require. So run that later on (see point 1).
如果可能的话,您还可以使用SELECT ... FOR UPDATE SKIP LOCK,这将跳过已经锁定的数据以取回并发性,而并发性因等待另一个事务释放锁(FOR UPDATE)而丢失。但这不会将 UPDATE 应用于您的应用程序逻辑可能需要的锁定行。所以稍后再运行(见第 1 点)。
Also read LOST UPDATEabout the LOST UPDATE
and
SKIP LOCKEDabout SKIP LOCKED
. A queue might be an idea in your case, which is explained perfectly in the SKIP LOCKED
reference, although relational DBMS are not meant to be queues.
又读丢失更新有关LOST UPDATE
和
SKIP LOCKED有关SKIP LOCKED
。在您的情况下,队列可能是一个想法,SKIP LOCKED
参考资料中对此进行了完美的解释,尽管关系 DBMS 并不意味着是队列。
HTH
HTH