SQL 如何在 PostgreSQL 中使用 RETURNING 和 ON CONFLICT?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34708509/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to use RETURNING with ON CONFLICT in PostgreSQL?
提问by zola
I have the following UPSERT in PostgreSQL 9.5:
我在 PostgreSQL 9.5 中有以下 UPSERT:
INSERT INTO chats ("user", "contact", "name")
VALUES (, , ),
(, , NULL)
ON CONFLICT("user", "contact") DO NOTHING
RETURNING id;
If there are no conflicts it returns something like this:
如果没有冲突,它将返回如下内容:
----------
| id |
----------
1 | 50 |
----------
2 | 51 |
----------
But if there are conflicts it doesn't return any rows:
但如果有冲突,它不会返回任何行:
----------
| id |
----------
I want to return the new id
columns if there are no conflicts or return the existing id
columns of the conflicting columns.
Can this be done?If so, how?
id
如果没有冲突,我想返回新列或返回id
冲突列的现有列。
这能做到吗?如果是这样,如何?
采纳答案by Alextoni
I had exactly the same problem, and I solved it using 'do update' instead of 'do nothing', even though I had nothing to update. In your case it would be something like this:
我遇到了完全相同的问题,我使用“执行更新”而不是“什么都不做”来解决它,即使我没有什么要更新的。在你的情况下,它会是这样的:
INSERT INTO chats ("user", "contact", "name")
VALUES (, , ),
(, , NULL)
ON CONFLICT("user", "contact") DO UPDATE SET name=EXCLUDED.name RETURNING id;
This query will return all the rows, regardless they have just been inserted or they existed before.
此查询将返回所有行,无论它们是刚刚插入还是之前存在。
回答by Erwin Brandstetter
The currently accepted answerseems ok for a singleconflict target, fewconflicts, small tuples and no triggers. And it avoids concurrency issue 1(see below) with brute force. The simple solution has its appeal, the side effects may be less important.
在目前接受的答案似乎确定一个单一的冲突的目标,很少冲突,小元组和没有触发器。它用蛮力避免了并发问题 1(见下文)。简单的解决方案有其吸引力,副作用可能不那么重要。
For all other cases, though, do notupdate identical rows without need. Even if you see no difference on the surface, there are various side effects:
但是,对于所有其他情况,不要在不需要的情况下更新相同的行。即使表面上看不出任何区别,也有各种副作用:
It might fire triggers that should not be fired.
It write-locks "innocent" rows, possibly incurring costs for concurrent transactions.
It might make the row seem new, though it's old (transaction timestamp).
Most importantly, with PostgreSQL's MVCC modela new row version is written either way, no matter whether the row data is the same. This incurs a performance penalty for the UPSERT itself, table bloat, index bloat, performance penalty for all subsequent operations on the table,
VACUUM
cost. A minor effect for few duplicates, but massivefor mostly dupes.
它可能会触发不应触发的触发器。
它写锁定“无辜”行,可能会产生并发事务的成本。
它可能会使该行看起来很新,尽管它很旧(交易时间戳)。
最重要的是,使用PostgreSQL 的 MVCC 模型,无论行数据是否相同,都会以任何一种方式编写新的行版本。这会导致 UPSERT 本身的性能损失、表膨胀、索引膨胀、表上所有后续操作的性能损失、
VACUUM
成本。对少数重复的影响很小,但对大多数被骗者影响很大。
Plus, sometimes it is not practical or even possible to use ON CONFLICT DO UPDATE
. The manual:
另外,有时它不切实际甚至不可能使用ON CONFLICT DO UPDATE
。手册:
For
ON CONFLICT DO UPDATE
, aconflict_target
must be provided.
对于
ON CONFLICT DO UPDATE
,conflict_target
必须提供 a。
You can achieve (almost) the same without empty updates and side effects. And some of the following solutions also work with ON CONFLICT DO NOTHING
(no "conflict target"), to catch allpossible conflicts that might arise. (May or may not be desirable.)
您可以实现(几乎)相同的效果,而无需空更新和副作用。并且以下一些解决方案也适用于ON CONFLICT DO NOTHING
(无“冲突目标”),以捕获可能出现的所有可能的冲突。(可能是也可能不是可取的。)
Without concurrent write load
没有并发写入负载
WITH input_rows(usr, contact, name) AS (
VALUES
(text 'foo1', text 'bar1', text 'bob1') -- type casts in first row
, ('foo2', 'bar2', 'bob2')
-- more?
)
, ins AS (
INSERT INTO chats (usr, contact, name)
SELECT * FROM input_rows
ON CONFLICT (usr, contact) DO NOTHING
RETURNING id --, usr, contact -- return more columns?
)
SELECT 'i' AS source -- 'i' for 'inserted'
, id --, usr, contact -- return more columns?
FROM ins
UNION ALL
SELECT 's' AS source -- 's' for 'selected'
, c.id --, usr, contact -- return more columns?
FROM input_rows
JOIN chats c USING (usr, contact); -- columns of unique index
The source
column is an optional addition to demonstrate how this works. You may actually need it to tell the difference between both cases (another advantage over empty writes).
该source
列是一个可选的附加内容,用于演示这是如何工作的。您实际上可能需要它来区分两种情况(与空写入相比的另一个优势)。
The final JOIN chats
works because newly inserted rows from an attached data-modifying CTEare not yet visible in the underlying table. (All parts of the same SQL statement see the same snapshots of underlying tables.)
最终的JOIN chats
工作是因为从附加的数据修改 CTE新插入的行在基础表中尚不可见。(同一 SQL 语句的所有部分都可以看到基础表的相同快照。)
Since the VALUES
expression is free-standing (not directly attached to an INSERT
) Postgres cannot derive data types from the target columns and you may have to add explicit type casts. The manual:
由于VALUES
表达式是独立的(不直接附加到INSERT
),Postgres 无法从目标列派生数据类型,您可能必须添加显式类型转换。手册:
When
VALUES
is used inINSERT
, the values are all automatically coerced to the data type of the corresponding destination column. When it's used in other contexts, it might be necessary to specify the correct data type. If the entries are all quoted literal constants, coercing the first is sufficient to determine the assumed type for all.
在
VALUES
中使用时INSERT
,所有值都会自动强制转换为对应目标列的数据类型。当它在其他上下文中使用时,可能需要指定正确的数据类型。如果条目都是带引号的文字常量,则强制第一个足以确定所有假定的类型。
The query itself may be a bit more expensive for fewdupes, due to the overhead of the CTE and the additional SELECT
(which should be cheap since the perfect index is there by definition - a unique constraint is implemented with an index).
由于 CTE 的开销和额外的开销(这应该很便宜,因为根据定义存在完美的索引 - 使用索引实现了唯一约束),因此对于少数欺骗者来说,查询本身可能会更昂贵一些SELECT
。
May be (much) faster for manyduplicates. The effective cost of additional writes depends on many factors.
对于许多重复项,可能(快得多)。额外写入的有效成本取决于许多因素。
But there are fewer side effects and hidden costsin any case. It's most probably cheaper overall.
但无论如何,副作用和隐性成本都更少。它很可能整体更便宜。
(Attached sequences are still advanced, since default values are filled in beforetesting for conflicts.)
(附加序列仍然是高级的,因为在测试冲突之前填充了默认值。)
About CTEs:
关于 CTE:
- Are SELECT type queries the only type that can be nested?
- Deduplicate SELECT statements in relational division
With concurrent write load
并发写负载
Assuming default READ COMMITTED
transaction isolation.
假设默认READ COMMITTED
事务隔离。
Related answer on dba.SE with detailed explanation:
dba.SE 上的相关答案,并附有详细说明:
The best strategy to defend against race conditions depends on exact requirements, the number and size of rows in the table and in the UPSERTs, the number of concurrent transactions, the likelihood of conflicts, available resources and other factors ...
防御竞争条件的最佳策略取决于确切的要求、表和 UPSERT 中行的数量和大小、并发事务的数量、冲突的可能性、可用资源和其他因素......
Concurrency issue 1
并发问题1
If a concurrent transaction has written to a row which your transaction now tries to UPSERT, your transaction has to wait for the other one to finish.
如果并发事务已写入您的事务现在尝试 UPSERT 的行,则您的事务必须等待另一个事务完成。
If the other transaction ends with ROLLBACK
(or any error, i.e. automatic ROLLBACK
), your transaction can proceed normally. Minor side effect: gaps in the sequential numbers. But no missing rows.
如果其他交易以ROLLBACK
(或任何错误,即自动ROLLBACK
)结束,则您的交易可以正常进行。次要副作用:序列号中的间隙。但没有丢失的行。
If the other transaction ends normally (implicit or explicit COMMIT
), your INSERT
will detect a conflict (the UNIQUE
index / constraint is absolute) and DO NOTHING
, hence also not return the row. (Also cannot lock the row as demonstrated in concurrency issue 2below, since it's not visible.) The SELECT
sees the same snapshot from the start of the query and also cannot return the yet invisible row.
如果其他事务正常结束(隐式或显式COMMIT
),您INSERT
将检测到冲突(UNIQUE
索引/约束是绝对的)DO NOTHING
,因此也不会返回该行。(也不能像下面的并发问题 2中演示的那样锁定行,因为它不可见。)SELECT
从查询的开始看到相同的快照,并且也不能返回尚不可见的行。
Any such rows are missing from the result set (even though they exist in the underlying table)!
结果集中缺少任何此类行(即使它们存在于基础表中)!
This may be ok as is. Especially if you are not returning rows like in the example and are satisfied knowing the row is there. If that's not good enough, there are various ways around it.
这可能没问题。特别是如果您没有像示例中那样返回行并且对知道该行在那里感到满意。如果这还不够好,有多种方法可以解决。
You could check the row count of the output and repeat the statement if it does not match the row count of the input. May be good enough for the rare case. The point is to start a new query (can be in the same transaction), which will then see the newly committed rows.
您可以检查输出的行数,如果它与输入的行数不匹配,则重复该语句。对于罕见的情况可能已经足够了。关键是启动一个新查询(可以在同一个事务中),然后将看到新提交的行。
Orcheck for missing result rows withinthe same query and overwritethose with the brute force trick demonstrated in Alextoni's answer.
或者在同一查询中检查缺失的结果行,并使用Alextoni's answer 中演示的蛮力技巧覆盖那些行。
WITH input_rows(usr, contact, name) AS ( ... ) -- see above
, ins AS (
INSERT INTO chats AS c (usr, contact, name)
SELECT * FROM input_rows
ON CONFLICT (usr, contact) DO NOTHING
RETURNING id, usr, contact -- we need unique columns for later join
)
, sel AS (
SELECT 'i'::"char" AS source -- 'i' for 'inserted'
, id, usr, contact
FROM ins
UNION ALL
SELECT 's'::"char" AS source -- 's' for 'selected'
, c.id, usr, contact
FROM input_rows
JOIN chats c USING (usr, contact)
)
, ups AS ( -- RARE corner case
INSERT INTO chats AS c (usr, contact, name) -- another UPSERT, not just UPDATE
SELECT i.*
FROM input_rows i
LEFT JOIN sel s USING (usr, contact) -- columns of unique index
WHERE s.usr IS NULL -- missing!
ON CONFLICT (usr, contact) DO UPDATE -- we've asked nicely the 1st time ...
SET name = c.name -- ... this time we overwrite with old value
-- SET name = EXCLUDED.name -- alternatively overwrite with *new* value
RETURNING 'u'::"char" AS source -- 'u' for updated
, id --, usr, contact -- return more columns?
)
SELECT source, id FROM sel
UNION ALL
TABLE ups;
It's like the query above, but we add one more step with the CTE ups
, before we return the completeresult set. That last CTE will do nothing most of the time. Only if rows go missing from the returned result, we use brute force.
就像上面的查询,但ups
在返回完整的结果集之前,我们在 CTE 中添加了一个步骤。大多数情况下,最后一个 CTE 什么都不做。只有当返回的结果中缺少行时,我们才使用蛮力。
More overhead, yet. The more conflicts with pre-existing rows, the more likely this will outperform the simple approach.
还有更多的开销。与预先存在的行冲突越多,这越有可能胜过简单的方法。
One side effect: the 2nd UPSERT writes rows out of order, so it re-introduces the possibility of deadlocks (see below) if three or moretransactions writing to the same rows overlap. If that's a problem, you need a different solution.
一个副作用:第二个 UPSERT 会乱序写入行,因此如果三个或更多写入相同行的事务重叠,它会重新引入死锁的可能性(见下文)。如果这是一个问题,您需要不同的解决方案。
Concurrency issue 2
并发问题2
If concurrent transactions can write to involved columns of affected rows, and you have to make sure the rows you found are still there at a later stage in the same transaction, you can lock existing rowscheaply in the CTE ins
(which would otherwise go unlocked) with:
如果并发事务可以写入受影响行的相关列,并且您必须确保在同一事务的稍后阶段您找到的行仍然存在,您可以在 CTE 中廉价地锁定现有行ins
(否则会被解锁)和:
...
ON CONFLICT (usr, contact) DO UPDATE
SET name = name WHERE FALSE -- never executed, but still locks the row
...
And add a locking clause to the SELECT
as well, like FOR UPDATE
.
并在 中添加一个锁定子句SELECT
,例如FOR UPDATE
.
This makes competing write operations wait till the end of the transaction, when all locks are released. So be brief.
这使得竞争的写操作等到事务结束,当所有的锁都被释放时。所以要简短。
More details and explanation:
更多细节和解释:
- How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
- Is SELECT or INSERT in a function prone to race conditions?
Deadlocks?
僵局?
Defend against deadlocksby inserting rows in consistent order. See:
通过以一致的顺序插入行来防止死锁。看:
Data types and casts
数据类型和转换
Existing table as template for data types ...
现有表作为数据类型的模板...
Explicit type casts for the first row of data in the free-standing VALUES
expression may be inconvenient. There are ways around it. You can use any existing relation (table, view, ...) as row template. The target table is the obvious choice for the use case. Input data is coerced to appropriate types automatically, like in the VALUES
clause of an INSERT
:
独立VALUES
表达式中第一行数据的显式类型转换可能不方便。有办法解决它。您可以使用任何现有关系(表、视图、...)作为行模板。目标表是用例的明显选择。输入数据被自动强制转换为适当的类型,就像在VALUES
an的子句中一样INSERT
:
WITH input_rows AS (
(SELECT usr, contact, name FROM chats LIMIT 0) -- only copies column names and types
UNION ALL
VALUES
('foo1', 'bar1', 'bob1') -- no type casts here
, ('foo2', 'bar2', 'bob2')
)
...
This does not work for some data types (explanation in the linked answer at the bottom). The next trick works for alldata types:
这不适用于某些数据类型(底部链接答案中的解释)。下一个技巧适用于所有数据类型:
... and names
...和名字
If you insert whole rows (all columns of the table - or at least a set of leadingcolumns), you can omit column names, too. Assuming table chats
in the example only consists of the 3 columns used in the UPSERT:
如果您插入整行(表的所有列 - 或至少一组前导列),您也可以省略列名。假设chats
示例中的表仅包含 UPSERT 中使用的 3 列:
WITH input_rows AS (
SELECT * FROM (
VALUES
((NULL::chats).*) -- copies whole row definition
('foo1', 'bar1', 'bob1') -- no type casts needed
, ('foo2', 'bar2', 'bob2')
) sub
OFFSET 1
)
...
Detailed explanation and more alternatives:
详细说明和更多选择:
Aside: don't use reserved words like "user"
as identifier. That's a loaded footgun. Use legal, lower-case, unquoted identifiers. I replaced it with usr
.
旁白:不要使用保留字"user"
作为标识符。那是一把上膛的脚踏枪。使用合法的、小写的、不带引号的标识符。我用usr
.
回答by Jaumzera
Upsert, being an extension of the INSERT
query can be defined with two different behaviors in case of a constraint conflict: DO NOTHING
or DO UPDATE
.
Upsert,作为INSERT
查询的扩展,可以在约束冲突的情况下定义为两种不同的行为:DO NOTHING
或DO UPDATE
。
INSERT INTO upsert_table VALUES (2, 6, 'upserted')
ON CONFLICT DO NOTHING RETURNING *;
id | sub_id | status
----+--------+--------
(0 rows)
Note as well that RETURNING
returns nothing, because no tuples have been inserted. Now with DO UPDATE
, it is possible to perform operations on the tuple there is a conflict with. First note that it is important to define a constraint which will be used to define that there is a conflict.
还要注意RETURNING
,这不返回任何内容,因为没有插入元组。现在有了DO UPDATE
,就可以对存在冲突的元组执行操作。首先请注意,定义一个约束很重要,该约束将用于定义存在冲突。
INSERT INTO upsert_table VALUES (2, 2, 'inserted')
ON CONFLICT ON CONSTRAINT upsert_table_sub_id_key
DO UPDATE SET status = 'upserted' RETURNING *;
id | sub_id | status
----+--------+----------
2 | 2 | upserted
(1 row)
回答by Jo?o Haas
For insertions of a single item, I would probably use a coalesce when returning the id:
对于单个项目的插入,我可能会在返回 id 时使用合并:
WITH new_chats AS (
INSERT INTO chats ("user", "contact", "name")
VALUES (, , )
ON CONFLICT("user", "contact") DO NOTHING
RETURNING id
) SELECT COALESCE(
(SELECT id FROM new_chats),
(SELECT id FROM chats WHERE user = AND contact = )
);
回答by ChoNuff
I modified the amazing answer by Erwin Brandstetter, which won't increment the sequence, and also won't write-lock any rows. I'm relatively new to PostgreSQL, so please feel free to let me know if you see any drawbacks to this method:
我修改了 Erwin Brandstetter 的惊人答案,它不会增加序列,也不会写锁定任何行。我对 PostgreSQL 比较陌生,所以如果您发现这种方法有任何缺点,请随时告诉我:
WITH input_rows(usr, contact, name) AS (
VALUES
(text 'foo1', text 'bar1', text 'bob1') -- type casts in first row
, ('foo2', 'bar2', 'bob2')
-- more?
)
, new_rows AS (
SELECT
c.usr
, c.contact
, c.name
, CASE WHEN r.id IS NULL THEN FALSE ELSE TRUE END AS row_exists
FROM input_rows AS r
LEFT JOIN chats AS c ON r.usr=c.usr AND r.contact=c.contact
)
INSERT INTO chats (usr, contact, name)
SELECT usr, contact, name
FROM new_rows
WHERE row_exists=FALSE
RETURNING id, usr, contact, name
This assumes that the table chats
has a unique constraint on columns (usr, contact)
.
这假设表chats
对列有唯一约束(usr, contact)
。