postgresql 如何在 RETURNING from INSERT ... ON CONFLICT 中包含排除的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35949877/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 02:11:56  来源:igfitidea点击:

How to include excluded rows in RETURNING from INSERT ... ON CONFLICT

sqldjangopostgresqlupsertsql-returning

提问by Dustin Wyatt

I've got this table (generated by Django):

我有这张表(由 Django 生成):

CREATE TABLE feeds_person (
  id serial PRIMARY KEY,
  created timestamp with time zone NOT NULL,
  modified timestamp with time zone NOT NULL,
  name character varying(4000) NOT NULL,
  url character varying(1000) NOT NULL,
  email character varying(254) NOT NULL,
  CONSTRAINT feeds_person_name_ad8c7469_uniq UNIQUE (name, url, email)
);

I'm trying to bulk insert a lot of data using INSERTwith an ON CONFLICTclause.

我想批量插入大量使用数据INSERTON CONFLICT条款。

The wrinkle is that I need to get the idback for allof the rows, whether they're already existing or not.

问题是我需要id返回所有行,无论它们是否已经存在。

In other cases, I would do something like:

在其他情况下,我会做类似的事情:

INSERT INTO feeds_person (created, modified, name, url, email)
VALUES blah blah blah
ON CONFLICT (name, url, email) DO UPDATE SET url = feeds_person.url
RETURNING id

Doing the UPDATEcauses the statement to return the idof that row. Except, it doesn't work with this table. I thinkit doesn't work because I've got multiple fields unique together whereas in other instances I've used this method I've had just one unique field.

这样做UPDATE会导致语句返回该id行的 。除了,它不适用于此表。我认为它不起作用,因为我有多个唯一的字段,而在其他情况下,我使用了这种方法,我只有一个唯一的字段。

I get this error when trying to run the SQL through Django's cursor:

尝试通过 Django 的游标运行 SQL 时出现此错误:

django.db.utils.ProgrammingError: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
django.db.utils.ProgrammingError: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

How do I do the bulk insert with this table and get back the inserted and existing ids?

如何使用此表进行批量插入并取回插入的和现有的 ID?

回答by Erwin Brandstetter

The error you get:

你得到的错误:

ON CONFLICT DO UPDATE command cannot affect row a second time

ON CONFLICT DO UPDATE 命令不能再次影响行

indicates you are trying to upsert the same row more than once in a single command. In other words: you have dupes on (name, url, email)in your VALUESlist. Fold duplicates (if that's an option) and it should work. But you will have to decide which row to pick from each set of dupes.

表示您尝试在单个命令中多次插入同一行。换句话说:你(name, url, email)VALUES名单上有被骗者。折叠重复项(如果这是一个选项),它应该可以工作。但是您必须决定从每组欺骗中选择哪一行。

INSERT INTO feeds_person (created, modified, name, url, email)
SELECT DISTINCT ON (name, url, email) *
FROM  (
   VALUES
   ('blah', 'blah', 'blah', 'blah', 'blah')
   -- ... more
   ) v(created, modified, name, url, email)  -- match column list
ON     CONFLICT (name, url, email) DO UPDATE
SET    url = feeds_person.url
RETURNING id;

Since we use a free-standing VALUESexpression now, you have to add explicit type casts for non-default types. Like:

由于我们现在使用独立VALUES表达式,因此您必须为非默认类型添加显式类型转换。喜欢:

VALUES
    (timestamptz '2016-03-12 02:47:56+01'
   , timestamptz '2016-03-12 02:47:56+01'
   , 'n3', 'u3', 'e3')
   ...

Your timestamptzcolumns need an explicit type cast, while the string types can operate with default text. (You could still cast to varchar(n)right away.)

您的timestamptz列需要显式类型转换,而字符串类型可以使用 default 操作text。(你仍然可以立即投射到varchar(n)。)

There are ways to determine which row to pick from each set of dupes:

有多种方法可以确定从每组欺骗中选择哪一行:

You are right, there is (currently) no way to get excludedrows in the RETURNINGclause. I quote the Postgres Wiki:

你是对的,(目前)没有办法在子句中获得排除的RETURNING。我引用Postgres Wiki

Note that RETURNINGdoes not make visible the "EXCLUDED.*" alias from the UPDATE(just the generic "TARGET.*" alias is visible there). Doing so is thought to create annoying ambiguity for the simple, common cases [30]for little to no benefit. At some point in the future, we may pursue a way of exposing if RETURNING-projected tuples were inserted and updated, but this probably doesn't need to make it into the first committed iteration of the feature [31].

请注意,RETURNING不会使“ EXCLUDED.*”别名可见UPDATE(只有通用的“ TARGET.*”别名在那里可见)。这样做被认为会为简单、常见的情况[30]产生令人讨厌的歧义,而几乎没有好处。在未来的某个时候,我们可能会寻求一种公开 if RETURNING-projected tuples 插入和更新的方法,但这可能不需要将其纳入功能的第一次提交迭代[31]

However, you shouldn't be updating rows that are not supposed to be updated. Empty updates are almost as expensive as regular updates - and might have unintended side effects. You don't strictly need UPSERT to begin with, your case looks more like "SELECT or INSERT". Related:

但是,您不应该更新不应该更新的行。空更新几乎与常规更新一样昂贵 - 并且可能会产生意想不到的副作用。您一开始并不严格需要 UPSERT,您的案例看起来更像是“SELECT 或 INSERT”。有关的:

Onecleaner way to insert a set of rows would be with data-modifying CTEs:

插入一组行的一种更简洁的方法是使用数据修改 CTE:

WITH val AS (
   SELECT DISTINCT ON (name, url, email) *
   FROM  (
      VALUES 
      (timestamptz '2016-1-1 0:0+1', timestamptz '2016-1-1 0:0+1', 'n', 'u', 'e')
    , ('2016-03-12 02:47:56+01', '2016-03-12 02:47:56+01', 'n1', 'u3', 'e3')
      -- more (type cast only needed in 1st row)
      ) v(created, modified, name, url, email)
   )
, ins AS (
   INSERT INTO feeds_person (created, modified, name, url, email)
   SELECT created, modified, name, url, email FROM val
   ON     CONFLICT (name, url, email) DO NOTHING
   RETURNING id, name, url, email
   )
SELECT 'inserted' AS how, id FROM ins  -- inserted
UNION  ALL
SELECT 'selected' AS how, f.id         -- not inserted
FROM   val v
JOIN   feeds_person f USING (name, url, email);

The added complexity should pay for big tables where INSERTis the rule and SELECTthe exception.

增加的复杂性应该为INSERT规则和SELECT例外的大表付出代价。

Originally, I had added a NOT EXISTSpredicate on the last SELECTto prevent duplicates in the result. But that was redundant. All CTEs of a single query see the same snapshots of tables.The set returned with ON CONFLICT (name, url, email) DO NOTHINGis mutually exclusive to the set returned after the INNER JOINon the same columns.

最初,我NOT EXISTS在最后一个添加了一个谓词SELECT以防止结果中的重复。但那是多余的。单个查询的所有 CTE 都会看到相同的表快照。返回ON CONFLICT (name, url, email) DO NOTHING的集合INNER JOIN与相同列上的返回的集合互斥。

Unfortunately this also opens a tiny window for a race condition. If ...

不幸的是,这也为竞争条件打开了一个小窗口。如果 ...

  • a concurrent transaction inserts conflicting rows
  • has not committed yet
  • but commits eventually
  • 并发事务插入冲突行
  • 还没有承诺
  • 但最终提交

... some rows may be lost.

... 一些行可能会丢失。

You might just INSERT .. ON CONFLICT DO NOTHING, followed by a separate SELECTquery for all rows - within the same transaction to overcome this. Which in turn opens another tiny window for a race conditionif concurrent transactions can commit writes to the table between INSERTand SELECT(in default READ COMMITTEDisolation level). Can be avoided with REPEATABLE READtransaction isolation(or stricter). Or with a (possibly expensive or even unacceptable) write lock on the whole table. You can get any behavior you need, but there may be a price to pay.

您可能只是INSERT .. ON CONFLICT DO NOTHING,然后是SELECT对所有行的单独查询 - 在同一事务中以克服此问题。如果并发事务可以在和之间提交对表的写入(在默认隔离级别),那么这又会为竞争条件打开另一个小窗口。可以通过事务隔离(或更严格)来避免。或者在整个表上使用(可能很昂贵甚至不可接受的)写锁。您可以获得所需的任何行为,但可能需要付出代价。INSERTSELECTREAD COMMITTEDREPEATABLE READ

Related:

有关的: