postgresql 如何在 RETURNING from INSERT ... ON CONFLICT 中包含排除的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35949877/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
提问by Dustin Wyatt
I've got this table (generated by Django):
我有这张表(由 Django 生成):
CREATE TABLE feeds_person (
id serial PRIMARY KEY,
created timestamp with time zone NOT NULL,
modified timestamp with time zone NOT NULL,
name character varying(4000) NOT NULL,
url character varying(1000) NOT NULL,
email character varying(254) NOT NULL,
CONSTRAINT feeds_person_name_ad8c7469_uniq UNIQUE (name, url, email)
);
I'm trying to bulk insert a lot of data using INSERT
with an ON CONFLICT
clause.
我想批量插入大量使用数据INSERT
与ON CONFLICT
条款。
The wrinkle is that I need to get the id
back for allof the rows, whether they're already existing or not.
问题是我需要id
返回所有行,无论它们是否已经存在。
In other cases, I would do something like:
在其他情况下,我会做类似的事情:
INSERT INTO feeds_person (created, modified, name, url, email)
VALUES blah blah blah
ON CONFLICT (name, url, email) DO UPDATE SET url = feeds_person.url
RETURNING id
Doing the UPDATE
causes the statement to return the id
of that row. Except, it doesn't work with this table. I thinkit doesn't work because I've got multiple fields unique together whereas in other instances I've used this method I've had just one unique field.
这样做UPDATE
会导致语句返回该id
行的 。除了,它不适用于此表。我认为它不起作用,因为我有多个唯一的字段,而在其他情况下,我使用了这种方法,我只有一个唯一的字段。
I get this error when trying to run the SQL through Django's cursor:
尝试通过 Django 的游标运行 SQL 时出现此错误:
django.db.utils.ProgrammingError: ON CONFLICT DO UPDATE command cannot affect row a second time HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
django.db.utils.ProgrammingError: ON CONFLICT DO UPDATE command cannot affect row a second time HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
How do I do the bulk insert with this table and get back the inserted and existing ids?
如何使用此表进行批量插入并取回插入的和现有的 ID?
回答by Erwin Brandstetter
The error you get:
你得到的错误:
ON CONFLICT DO UPDATE command cannot affect row a second time
ON CONFLICT DO UPDATE 命令不能再次影响行
indicates you are trying to upsert the same row more than once in a single command. In other words: you have dupes on (name, url, email)
in your VALUES
list. Fold duplicates (if that's an option) and it should work. But you will have to decide which row to pick from each set of dupes.
表示您尝试在单个命令中多次插入同一行。换句话说:你(name, url, email)
的VALUES
名单上有被骗者。折叠重复项(如果这是一个选项),它应该可以工作。但是您必须决定从每组欺骗中选择哪一行。
INSERT INTO feeds_person (created, modified, name, url, email)
SELECT DISTINCT ON (name, url, email) *
FROM (
VALUES
('blah', 'blah', 'blah', 'blah', 'blah')
-- ... more
) v(created, modified, name, url, email) -- match column list
ON CONFLICT (name, url, email) DO UPDATE
SET url = feeds_person.url
RETURNING id;
Since we use a free-standing VALUES
expression now, you have to add explicit type casts for non-default types. Like:
由于我们现在使用独立VALUES
表达式,因此您必须为非默认类型添加显式类型转换。喜欢:
VALUES
(timestamptz '2016-03-12 02:47:56+01'
, timestamptz '2016-03-12 02:47:56+01'
, 'n3', 'u3', 'e3')
...
Your timestamptz
columns need an explicit type cast, while the string types can operate with default text
. (You could still cast to varchar(n)
right away.)
您的timestamptz
列需要显式类型转换,而字符串类型可以使用 default 操作text
。(你仍然可以立即投射到varchar(n)
。)
There are ways to determine which row to pick from each set of dupes:
有多种方法可以确定从每组欺骗中选择哪一行:
You are right, there is (currently) no way to get excludedrows in the RETURNING
clause. I quote the Postgres Wiki:
你是对的,(目前)没有办法在子句中获得排除的行RETURNING
。我引用Postgres Wiki:
Note that
RETURNING
does not make visible the "EXCLUDED.*
" alias from theUPDATE
(just the generic "TARGET.*
" alias is visible there). Doing so is thought to create annoying ambiguity for the simple, common cases [30]for little to no benefit. At some point in the future, we may pursue a way of exposing ifRETURNING
-projected tuples were inserted and updated, but this probably doesn't need to make it into the first committed iteration of the feature [31].
请注意,
RETURNING
不会使“EXCLUDED.*
”别名可见UPDATE
(只有通用的“TARGET.*
”别名在那里可见)。这样做被认为会为简单、常见的情况[30]产生令人讨厌的歧义,而几乎没有好处。在未来的某个时候,我们可能会寻求一种公开 ifRETURNING
-projected tuples 插入和更新的方法,但这可能不需要将其纳入功能的第一次提交迭代[31]。
However, you shouldn't be updating rows that are not supposed to be updated. Empty updates are almost as expensive as regular updates - and might have unintended side effects. You don't strictly need UPSERT to begin with, your case looks more like "SELECT or INSERT". Related:
但是,您不应该更新不应该更新的行。空更新几乎与常规更新一样昂贵 - 并且可能会产生意想不到的副作用。您一开始并不严格需要 UPSERT,您的案例看起来更像是“SELECT 或 INSERT”。有关的:
Onecleaner way to insert a set of rows would be with data-modifying CTEs:
插入一组行的一种更简洁的方法是使用数据修改 CTE:
WITH val AS (
SELECT DISTINCT ON (name, url, email) *
FROM (
VALUES
(timestamptz '2016-1-1 0:0+1', timestamptz '2016-1-1 0:0+1', 'n', 'u', 'e')
, ('2016-03-12 02:47:56+01', '2016-03-12 02:47:56+01', 'n1', 'u3', 'e3')
-- more (type cast only needed in 1st row)
) v(created, modified, name, url, email)
)
, ins AS (
INSERT INTO feeds_person (created, modified, name, url, email)
SELECT created, modified, name, url, email FROM val
ON CONFLICT (name, url, email) DO NOTHING
RETURNING id, name, url, email
)
SELECT 'inserted' AS how, id FROM ins -- inserted
UNION ALL
SELECT 'selected' AS how, f.id -- not inserted
FROM val v
JOIN feeds_person f USING (name, url, email);
The added complexity should pay for big tables where INSERT
is the rule and SELECT
the exception.
增加的复杂性应该为INSERT
规则和SELECT
例外的大表付出代价。
Originally, I had added a NOT EXISTS
predicate on the last SELECT
to prevent duplicates in the result. But that was redundant. All CTEs of a single query see the same snapshots of tables.The set returned with ON CONFLICT (name, url, email) DO NOTHING
is mutually exclusive to the set returned after the INNER JOIN
on the same columns.
最初,我NOT EXISTS
在最后一个添加了一个谓词SELECT
以防止结果中的重复。但那是多余的。单个查询的所有 CTE 都会看到相同的表快照。返回ON CONFLICT (name, url, email) DO NOTHING
的集合INNER JOIN
与相同列上的返回的集合互斥。
Unfortunately this also opens a tiny window for a race condition. If ...
不幸的是,这也为竞争条件打开了一个小窗口。如果 ...
- a concurrent transaction inserts conflicting rows
- has not committed yet
- but commits eventually
- 并发事务插入冲突行
- 还没有承诺
- 但最终提交
... some rows may be lost.
... 一些行可能会丢失。
You might just INSERT .. ON CONFLICT DO NOTHING
, followed by a separate SELECT
query for all rows - within the same transaction to overcome this. Which in turn opens another tiny window for a race conditionif concurrent transactions can commit writes to the table between INSERT
and SELECT
(in default READ COMMITTED
isolation level). Can be avoided with REPEATABLE READ
transaction isolation(or stricter). Or with a (possibly expensive or even unacceptable) write lock on the whole table. You can get any behavior you need, but there may be a price to pay.
您可能只是INSERT .. ON CONFLICT DO NOTHING
,然后是SELECT
对所有行的单独查询 - 在同一事务中以克服此问题。如果并发事务可以在和之间提交对表的写入(在默认隔离级别),那么这又会为竞争条件打开另一个小窗口。可以通过事务隔离(或更严格)来避免。或者在整个表上使用(可能很昂贵甚至不可接受的)写锁。您可以获得所需的任何行为,但可能需要付出代价。INSERT
SELECT
READ COMMITTED
REPEATABLE READ
Related:
有关的: