SQL 在 postgresql 中的“复制自”期间忽略重复键

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13947327/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 12:42:06  来源:igfitidea点击:

To ignore duplicate keys during 'copy from' in postgresql

sqlpostgresql

提问by Kam

I have to dump large amount of data from file to a table PostgreSQL. I know it does not support 'Ignore' 'replace' etc as done in MySql. Almost all posts regarding this in the web suggested the same thing like dumping the data to a temp table and then do a 'insert ... select ... where not exists...'.

我必须将大量数据从文件转储到表 PostgreSQL。我知道它不支持 MySql 中所做的“忽略”“替换”等。网络上几乎所有关于此的帖子都提出了相同的建议,例如将数据转储到临时表,然后执行“插入......选择......不存在的地方......”。

This will not help in one case, where the file data itself contained duplicate primary keys. Any body have an idea on how to handle this in PostgreSQL?

这在一种情况下无济于事,其中文件数据本身包含重复的主键。任何人都知道如何在 PostgreSQL 中处理这个问题?

P.S. I am doing this from a java program, if it helps

PS,如果有帮助的话,我是从 Java 程序中执行此操作的

回答by Igor Romanchenko

Use the same approach as you described, but DELETE(or group, or modify ...) duplicate PKin the temp table before loading to the main table.

使用与您描述的相同的方法,但DELETE(或分组或修改...)PK在加载到主表之前在临时表中重复。

Something like:

就像是:

CREATE TEMP TABLE tmp_table 
ON COMMIT DROP
AS
SELECT * 
FROM main_table
WITH NO DATA;

COPY tmp_table FROM 'full/file/name/here';

INSERT INTO main_table
SELECT DISTINCT ON (PK_field) *
FROM tmp_table
ORDER BY (some_fields)

Details: CREATE TABLE AS, COPY, DISTINCT ON

详情:CREATE TABLE AS, COPY,DISTINCT ON

回答by Alan Simmons

PostgreSQL 9.5 now has upsert functionality. You can follow Igor's instructions, except that final INSERT includes the clause ON CONFLICT DO NOTHING.

PostgreSQL 9.5 现在具有更新插入功能。您可以按照 Igor 的说明进行操作,但最终的 INSERT 包括 ON CONFLICT DO NOTHING 子句。

INSERT INTO main_table
SELECT *
FROM tmp_table
ON CONFLICT DO NOTHING

回答by Denis Drescher

Igor's answer helped me a lot, but I also ran into the problem Nate mentioned in his comment. Then I had the problem—maybe in addition to the question here—that the new data did not only contain duplicates internally but also duplicates with the existing data. What worked for me was the following.

Igor 的回答对我帮助很大,但我也遇到了 Nate 在他的评论中提到的问题。然后我遇到了问题——也许除了这里的问题之外——新数据不仅在内部包含重复项,而且还与现有数据重复。对我有用的是以下内容。

CREATE TEMP TABLE tmp_table AS SELECT * FROM newsletter_subscribers;
COPY tmp_table (name, email) FROM stdin DELIMITER ' ' CSV;
SELECT count(*) FROM tmp_table;  -- Just to be sure
TRUNCATE newsletter_subscribers;
INSERT INTO newsletter_subscribers
    SELECT DISTINCT ON (email) * FROM tmp_table
    ORDER BY email, subscription_status;
SELECT count(*) FROM newsletter_subscribers;  -- Paranoid again

Both internal and external duplicates become the same in the tmp_tableand then the DISTINCT ON (email)part removes them. The ORDER BYmakes sure that the desired row comes first in the result set and DISTINCTthen discards all further rows.

内部和外部重复项在 中变得相同tmp_table,然后DISTINCT ON (email)部件将它们删除。将ORDER BY可确保该行是第一位的结果集,并DISTINCT随后丢弃所有进一步行。

回答by Jester

Insert into a temp table grouped by the key so you get rid of the duplicates

插入到按键分组的临时表中,这样您就可以摆脱重复项

and then insert if not exists

如果不存在则插入