SQL 删除 PostgreSQL 中的重复记录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6583916/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 11:12:34  来源:igfitidea点击:

Delete Duplicate Records in PostgreSQL

sqlpostgresqlduplicates

提问by André Moruj?o

I have a table in a PostgreSQL 8.3.8 database, which has no keys/constraints on it, and has multiple rows with exactly the same values.

我在 PostgreSQL 8.3.8 数据库中有一个表,它没有键/约束,并且有多行具有完全相同的值。

I would like to remove all duplicates and keep only 1 copy of each row.

我想删除所有重复项,每行只保留 ​​1 个副本。

There is one column in particular (named "key") which may be used to identify duplicates (i.e. there should only exist one entry for each distinct "key").

特别有一个列(名为“键”)可用于识别重复项(即每个不同的“键”应该只存在一个条目)。

How can I do this? (ideally with a single SQL command) Speed is not a problem in this case (there are only a few rows).

我怎样才能做到这一点?(最好使用单个 SQL 命令)在这种情况下速度不是问题(只有几行)。

采纳答案by a_horse_with_no_name

DELETE FROM dupes a
WHERE a.ctid <> (SELECT min(b.ctid)
                 FROM   dupes b
                 WHERE  a.key = b.key);

回答by rapimo

A faster solution is

更快的解决方案是

DELETE FROM dups a USING (
      SELECT MIN(ctid) as ctid, key
        FROM dups 
        GROUP BY key HAVING COUNT(*) > 1
      ) b
      WHERE a.key = b.key 
      AND a.ctid <> b.ctid

回答by isapir

This is fast and concise:

这是快速而简洁的:

DELETE FROM dupes T1
    USING   dupes T2
WHERE   T1.ctid < T2.ctid  -- delete the older versions
    AND T1.key  = T2.key;  -- add more columns if needed

See also my answer at How to delete duplicate rows without unique identifierwhich includes more information.

另请参阅我在如何删除没有唯一标识符的重复行中的回答,其中包含更多信息。

回答by Radu Gabriel

I tried this:

我试过这个:

DELETE FROM tablename
WHERE id IN (SELECT id
              FROM (SELECT id,
                             ROW_NUMBER() OVER (partition BY column1, column2, column3 ORDER BY id) AS rnum
                     FROM tablename) t
              WHERE t.rnum > 1);

provided by Postgres wiki:

Postgres wiki 提供:

https://wiki.postgresql.org/wiki/Deleting_duplicates

https://wiki.postgresql.org/wiki/Deleting_duplicates

回答by expert

I had to create my own version. Version written by @a_horse_with_no_name is way too slow on my table (21M rows). And @rapimo simply doesn't delete dups.

我必须创建自己的版本。@a_horse_with_no_name 编写的版本在我的表(21M 行)上太慢了。而且@rapimo 根本不会删除重复项。

Here is what I use on PostgreSQL 9.5

这是我在 PostgreSQL 9.5 上使用的

DELETE FROM your_table
WHERE ctid IN (
  SELECT unnest(array_remove(all_ctids, actid))
  FROM (
         SELECT
           min(b.ctid)     AS actid,
           array_agg(ctid) AS all_ctids
         FROM your_table b
         GROUP BY key1, key2, key3, key4
         HAVING count(*) > 1) c);

回答by Pablo Santa Cruz

I would use a temporary table:

我会使用临时表:

create table tab_temp as
select distinct f1, f2, f3, fn
  from tab;

Then, delete taband rename tab_tempinto tab.

然后,删除tab并重命名tab_temptab.

回答by Beanwah

This worked well for me. I had a table, terms, that contained duplicate values. Ran a query to populate a temp table with all of the duplicate rows. Then I ran the a delete statement with those ids in the temp table. value is the column that contained the duplicates.

这对我来说效果很好。我有一个包含重复值的表项。运行查询以使用所有重复行填充临时表。然后我在临时表中使用这些 id 运行了删除语句。value 是包含重复项的列。

        CREATE TEMP TABLE dupids AS
        select id from (
                    select value, id, row_number() 
over (partition by value order by value) 
    as rownum from terms
                  ) tmp
                  where rownum >= 2;

delete from [table] where id in (select id from dupids)

回答by Zaytsev Dmitry

Another approach (works only if you have any unique field like idin your table) to find all unique ids by columns and remove other ids that are not in unique list

另一种方法(仅当您的表中有任何唯一字段时才有效id)按列查找所有唯一 ID 并删除不在唯一列表中的其他 ID

DELETE
FROM users
WHERE users.id NOT IN (SELECT DISTINCT ON (username, email) id FROM users);

回答by Barrie Walker

How about:

怎么样:

WITH
  u AS (SELECT DISTINCT * FROM your_table),
  x AS (DELETE FROM your_table)
INSERT INTO your_table SELECT * FROM u;

I had been concerned about execution order, would the DELETE happen before the SELECT DISTINCT, but it works fine for me. And has the added bonus of not needing any knowledge about the table structure.

我一直担心执行顺序,DELETE 是否会在 SELECT DISTINCT 之前发生,但它对我来说很好用。并且具有不需要任何有关表结构的知识的额外好处。