SQL 使用 PostgreSQL 查找重复行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14471179/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find duplicate rows with PostgreSQL
提问by schlubbi
We have a table of photos with the following columns:
我们有一个包含以下列的照片表:
id, merchant_id, url
this table contains duplicate values for the combination merchant_id, url
. so it's possible that one row appears more several times.
此表包含组合的重复值merchant_id, url
。所以一行可能会出现多次。
234 some_merchant http://www.some-image-url.com/abscde1213
235 some_merchant http://www.some-image-url.com/abscde1213
236 some_merchant http://www.some-image-url.com/abscde1213
What is the best way to delete those duplications? (I use PostgreSQL 9.2 and Rails 3.)
删除这些重复项的最佳方法是什么?(我使用 PostgreSQL 9.2 和 Rails 3。)
回答by MatthewJ
Here is my take on it.
这是我的看法。
select * from (
SELECT id,
ROW_NUMBER() OVER(PARTITION BY merchant_Id, url ORDER BY id asc) AS Row
FROM Photos
) dups
where
dups.Row > 1
Feel free to play with the order by to tailor the records you want to delete to your specification.
随意使用 order by 来定制要删除的记录以符合您的规范。
SQL Fiddle => http://sqlfiddle.com/#!15/d6941/1/0
SQL Fiddle => http://sqlfiddle.com/#!15/d6941/1/0
SQL Fiddle for Postgres 9.2 is no longer supported; updating SQL Fiddle to postgres 9.3
不再支持 Postgres 9.2 的 SQL Fiddle;将 SQL Fiddle 更新到 postgres 9.3
回答by 11101101b
The second part of sgeddes's answer doesn't work on Postgres (the fiddle uses MySQL). Here is an updated version of his answer using Postgres: http://sqlfiddle.com/#!12/6b1a7/1
sgeddes 答案的第二部分不适用于 Postgres(小提琴使用 MySQL)。这是他使用 Postgres 回答的更新版本:http://sqlfiddle.com/#!12/ 6b1a7/1
DELETE FROM Photos AS P1
USING Photos AS P2
WHERE P1.id > P2.id
AND P1.merchant_id = P2.merchant_id
AND P1.url = P2.url;
回答by sgeddes
I see a couple of options for you.
我为您提供了几种选择。
For a quick way of doing it, use something like this (it assumes your ID column is not unique as you mention 234 multiple times above):
要快速完成此操作,请使用以下内容(假设您的 ID 列不是唯一的,因为您在上面多次提到 234):
CREATE TABLE tmpPhotos AS SELECT DISTINCT * FROM Photos;
DROP TABLE Photos;
ALTER TABLE tmpPhotos RENAME TO Photos;
Here is the SQL Fiddle.
这是SQL Fiddle。
You will need to add your constraints back to the table if you have any.
如果您有任何约束,则需要将约束添加回表中。
If your ID column is unique, you could do something like to keep your lowest id:
如果你的 ID 列是唯一的,你可以做一些事情来保持你的最低 id:
DELETE FROM P1
USING Photos P1, Photos P2
WHERE P1.id > P2.id
AND P1.merchant_id = P2.merchant_id
AND P1.url = P2.url;
And the Fiddle.
和小提琴。