postgresql Postgres UPDATE with ORDER BY,怎么做?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44660368/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 02:35:16  来源:igfitidea点击:

Postgres UPDATE with ORDER BY, how to do it?

postgresqldatabase-deadlocks

提问by bbozo

I need to do a Postgres update on a collection of records & I'm trying to prevent a deadlock which appeared in the stress tests.

我需要对一组记录进行 Postgres 更新,我正试图防止出现在压力测试中的死锁。

The typical resolution to this is to update records in a certain order, by ID for example - but it seems that Postgres doesn't allow ORDER BY for UPDATE.

对此的典型解决方案是按特定顺序更新记录,例如按 ID - 但似乎 Postgres 不允许 ORDER BY 进行 UPDATE。

Assuming I need to do an update, for example:

假设我需要进行更新,例如:

UPDATE BALANCES WHERE ID IN (SELECT ID FROM some_function() ORDER BY ID);

results in deadlocks when you run 200 queries concurrently. What to do?

当您同时运行 200 个查询时会导致死锁。该怎么办?

I'm looking for a general solution, not case-specific workarounds like in UPDATE with ORDER BY

我正在寻找一个通用的解决方案,而不是像在UPDATE with ORDER BY 中那样针对特定情况的解决方法

It feelsthat there must be a better solution than writing a cursor function. Also, if there's no better way, how would that cursor function optimally look like? Update record-by-record

认为,必须有比写一个光标功能更好的解决方案。另外,如果没有更好的方法,那么该游标功能的最佳外观如何?逐条更新

回答by Nick Barnes

As far as I know, there's no way to accomplish this directly through the UPDATEstatement; the only way to guarantee lock order is to explicitly acquire locks with a SELECT ... ORDER BY ID FOR UPDATE, e.g.:

据我所知,没有办法直接通过UPDATE语句来实现这一点;保证锁顺序的唯一方法是使用 a 显式获取锁SELECT ... ORDER BY ID FOR UPDATE,例如:

UPDATE Balances
SET Balance = 0
WHERE ID IN (
  SELECT ID FROM Balances
  WHERE ID IN (SELECT ID FROM some_function())
  ORDER BY ID
  FOR UPDATE
)

This has the downside of repeating the IDindex lookup on the Balancestable. In your simple example, you can avoid this overhead by fetching the physical row address (represented by the ctidsystem column) during the locking query, and using that to drive the UPDATE:

这具有IDBalances表上重复索引查找的缺点。在您的简单示例中,您可以通过在锁定查询期间获取物理行地址(由ctidsystem column表示)来避免这种开销,并使用它来驱动UPDATE

UPDATE Balances
SET Balance = 0
WHERE ctid = ANY(ARRAY(
  SELECT ctid FROM Balances
  WHERE ID IN (SELECT ID FROM some_function())
  ORDER BY ID
  FOR UPDATE
))

(Be careful when using ctids, as the values are transient. We're safe here, as the locks will block any changes.)

(使用ctids时要小心,因为值是瞬态的。我们在这里是安全的,因为锁会阻止任何更改。)

Unfortunately, the planner will only utilise the ctidin a narrow set of cases (you can tell if it's working by looking for a "Tid Scan" node in the EXPLAINoutput). To handle more complicated queries within a single UPDATEstatement, e.g. if your new balance was being returned by some_function()alongside the ID, you'll need to fall back to the ID-based lookup:

不幸的是,规划器只会ctid在少数情况下使用(您可以通过在EXPLAIN输出中查找“Tid Scan”节点来判断它是否有效)。要在单个UPDATE语句中处理更复杂的查询,例如,如果您的新余额some_function()与 ID 一起返回,您需要回退到基于 ID 的查找:

UPDATE Balances
SET Balance = Locks.NewBalance
FROM (
  SELECT Balances.ID, some_function.NewBalance
  FROM Balances
  JOIN some_function() ON some_function.ID = Balances.ID
  ORDER BY Balances.ID
  FOR UPDATE
) Locks
WHERE Balances.ID = Locks.ID

If the performance overhead is an issue, you'd need to resort to using a cursor, which would look something like this:

如果性能开销是一个问题,您需要求助于使用游标,它看起来像这样:

DO $$
DECLARE
  c CURSOR FOR
    SELECT Balances.ID, some_function.NewBalance
    FROM Balances
    JOIN some_function() ON some_function.ID = Balances.ID
    ORDER BY Balances.ID
    FOR UPDATE;
BEGIN
  FOR row IN c LOOP
    UPDATE Balances
    SET Balance = row.NewBalance
    WHERE CURRENT OF c;
  END LOOP;
END
$$

回答by flutter

In general, concurrency is difficult. Especially with 200 statements (i'm assuming you don't only query = SELECT) or even transactions (actually every single statement issued is wrapped into a transaction if it's not in a transaction already).

一般来说,并发是困难的。尤其是有 200 条语句(我假设您不仅查询 = SELECT)甚至事务(实际上,如果它不在事务中,则发出的每个语句都被包装到事务中)。

The general solution concepts are (a combination of) these:

一般的解决方案概念是(组合)这些:

  1. To be aware that deadlocks can happen, catch them in the application, check the Error Codesfor class 40or 40P01and retry the transaction.

  2. Reserve locks. Use SELECT ... FOR UPDATE. Evade explicit locks as long as possible. Locks will force other transactions to wait for lock release, which harms concurrency, but can prevent transactions running into deadlocks. Check the example for deadlocks in chapter 13. Especially the one in which transaction A waits for B and B waits for A (the bank account thingy).

  3. Choose a different Isolation Level, for example a weaker one like READ COMMITED, if possible. Be aware of LOST UPDATEs in READ COMMITEDmode. Prevent them with REPEATABLE READ.

  1. 要知道,死锁可能发生,抓住他们的应用程序,检查错误代码class 4040P01和重试事务。

  2. 预留锁。使用SELECT ... FOR UPDATE. 尽可能长时间地避开显式锁。锁会强制其他事务等待锁释放,这会损害并发性,但可以防止事务陷入死锁。检查第 13 章中的死锁示例。尤其是事务 A 等待 B 和 B 等待 A(银行帐户的东西)的例子。

  3. 如果可能,请选择不同的隔离级别,例如较弱的隔离级别,例如READ COMMITED。请注意LOST UPDATEsREAD COMMITED模式。用 防止它们REPEATABLE READ

Write your statements with locks in the same order in EVERY transaction, for example by table name alphabetically.

在每个事务中以相同的顺序编写带有锁的语句,例如按表名的字母顺序。

LOCK / USE A  -- Transaction 1 
LOCK / USE B  -- Transaction 1
LOCK / USE C  -- Transaction 1
-- D not used -- Transaction 1

-- A not used -- Transaction 2
LOCK / USE B  -- Transaction 2
-- C not used -- Transaction 2
LOCK / USE D  -- Transaction 2

with the general locking order A B C D. This way, the transactions can interleave in any relative order and still have a good chance not to deadlock (depending on your statements you may have other serialization issues though). The statements of the transactions will run in the order specified by them, but it can be that transaction 1 runs their first 2, then xact 2 runs the first one, then 1 finishes and finally xact 2 finishes.

跟一般的锁定顺序A B C D。这样,事务可以以任何相对顺序交错,并且仍然有很好的机会不会死锁(根据您的语句,您可能还有其他序列化问题)。事务的语句将按照它们指定的顺序运行,但也可以是事务 1 运行它们的第一个 2,然后 xact 2 运行第一个,然后 1 完成,最后 xact 2 完成。

Also, you should realise that a statement involving multiple rows is not executed atomically in a concurrent situation. In other words, if you have two statements A and B involving multiple rows, then they can be executed in this order:

此外,您应该意识到在并发情况下不会自动执行涉及多行的语句。换句话说,如果您有两条语句 A 和 B 涉及多行,那么它们可以按以下顺序执行:

a1 b1 a2 a3 a4 b2 b3     

but NOT as a block of a's followed by b's. The same applies to a statement with a sub-query. Have you looked at the query plans using EXPLAIN?

但不是作为 a 的块后跟 b 的。这同样适用于带有子查询的语句。您是否使用过查看查询计划EXPLAIN

In your case, you can try

在你的情况下,你可以尝试

UPDATE BALANCES WHERE ID IN (
 SELECT ID FROM some_function() FOR UPDATE  -- LOCK using FOR UPDATE 
 -- other transactions will WAIT / BLOCK temporarily on conc. write access
);

If possible by what you want to do, you can also use SELECT ... FOR UPDATE SKIP LOCK, which will skip already locked data to get back concurrency, which is lost by WAITing for another transaction to release a lock (FOR UPDATE). But this will not apply an UPDATE to locked rows, which your application logic might require. So run that later on (see point 1).

如果可能的话,您还可以使用SELECT ... FOR UPDATE SKIP LOCK,这将跳过已经锁定的数据以取回并发性,而并发性因等待另一个事务释放锁(FOR UPDATE)而丢失。但这不会将 UPDATE 应用于您的应用程序逻辑可能需要的锁定行。所以稍后再运行(见第 1 点)。

Also read LOST UPDATEabout the LOST UPDATEand SKIP LOCKEDabout SKIP LOCKED. A queue might be an idea in your case, which is explained perfectly in the SKIP LOCKEDreference, although relational DBMS are not meant to be queues.

又读丢失更新有关LOST UPDATESKIP LOCKED有关SKIP LOCKED。在您的情况下,队列可能是一个想法,SKIP LOCKED参考资料中对此进行了完美的解释,尽管关系 DBMS 并不意味着是队列。

HTH

HTH