SQL 删除 PostgreSQL 中的重复记录
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6583916/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Delete Duplicate Records in PostgreSQL
提问by André Moruj?o
I have a table in a PostgreSQL 8.3.8 database, which has no keys/constraints on it, and has multiple rows with exactly the same values.
我在 PostgreSQL 8.3.8 数据库中有一个表,它没有键/约束,并且有多行具有完全相同的值。
I would like to remove all duplicates and keep only 1 copy of each row.
我想删除所有重复项,每行只保留 1 个副本。
There is one column in particular (named "key") which may be used to identify duplicates (i.e. there should only exist one entry for each distinct "key").
特别有一个列(名为“键”)可用于识别重复项(即每个不同的“键”应该只存在一个条目)。
How can I do this? (ideally with a single SQL command) Speed is not a problem in this case (there are only a few rows).
我怎样才能做到这一点?(最好使用单个 SQL 命令)在这种情况下速度不是问题(只有几行)。
采纳答案by a_horse_with_no_name
DELETE FROM dupes a
WHERE a.ctid <> (SELECT min(b.ctid)
FROM dupes b
WHERE a.key = b.key);
回答by rapimo
A faster solution is
更快的解决方案是
DELETE FROM dups a USING (
SELECT MIN(ctid) as ctid, key
FROM dups
GROUP BY key HAVING COUNT(*) > 1
) b
WHERE a.key = b.key
AND a.ctid <> b.ctid
回答by isapir
This is fast and concise:
这是快速而简洁的:
DELETE FROM dupes T1
USING dupes T2
WHERE T1.ctid < T2.ctid -- delete the older versions
AND T1.key = T2.key; -- add more columns if needed
See also my answer at How to delete duplicate rows without unique identifierwhich includes more information.
另请参阅我在如何删除没有唯一标识符的重复行中的回答,其中包含更多信息。
回答by Radu Gabriel
I tried this:
我试过这个:
DELETE FROM tablename
WHERE id IN (SELECT id
FROM (SELECT id,
ROW_NUMBER() OVER (partition BY column1, column2, column3 ORDER BY id) AS rnum
FROM tablename) t
WHERE t.rnum > 1);
provided by Postgres wiki:
Postgres wiki 提供:
回答by expert
I had to create my own version. Version written by @a_horse_with_no_name is way too slow on my table (21M rows). And @rapimo simply doesn't delete dups.
我必须创建自己的版本。@a_horse_with_no_name 编写的版本在我的表(21M 行)上太慢了。而且@rapimo 根本不会删除重复项。
Here is what I use on PostgreSQL 9.5
这是我在 PostgreSQL 9.5 上使用的
DELETE FROM your_table
WHERE ctid IN (
SELECT unnest(array_remove(all_ctids, actid))
FROM (
SELECT
min(b.ctid) AS actid,
array_agg(ctid) AS all_ctids
FROM your_table b
GROUP BY key1, key2, key3, key4
HAVING count(*) > 1) c);
回答by Pablo Santa Cruz
I would use a temporary table:
我会使用临时表:
create table tab_temp as
select distinct f1, f2, f3, fn
from tab;
Then, delete tab
and rename tab_temp
into tab
.
然后,删除tab
并重命名tab_temp
为tab
.
回答by Beanwah
This worked well for me. I had a table, terms, that contained duplicate values. Ran a query to populate a temp table with all of the duplicate rows. Then I ran the a delete statement with those ids in the temp table. value is the column that contained the duplicates.
这对我来说效果很好。我有一个包含重复值的表项。运行查询以使用所有重复行填充临时表。然后我在临时表中使用这些 id 运行了删除语句。value 是包含重复项的列。
CREATE TEMP TABLE dupids AS
select id from (
select value, id, row_number()
over (partition by value order by value)
as rownum from terms
) tmp
where rownum >= 2;
delete from [table] where id in (select id from dupids)
回答by Zaytsev Dmitry
Another approach (works only if you have any unique field like id
in your table) to find all unique ids by columns and remove other ids that are not in unique list
另一种方法(仅当您的表中有任何唯一字段时才有效id
)按列查找所有唯一 ID 并删除不在唯一列表中的其他 ID
DELETE
FROM users
WHERE users.id NOT IN (SELECT DISTINCT ON (username, email) id FROM users);
回答by Barrie Walker
How about:
怎么样:
WITH u AS (SELECT DISTINCT * FROM your_table), x AS (DELETE FROM your_table) INSERT INTO your_table SELECT * FROM u;
I had been concerned about execution order, would the DELETE happen before the SELECT DISTINCT, but it works fine for me. And has the added bonus of not needing any knowledge about the table structure.
我一直担心执行顺序,DELETE 是否会在 SELECT DISTINCT 之前发生,但它对我来说很好用。并且具有不需要任何有关表结构的知识的额外好处。