在 Oracle 中查找重复数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7771552/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Finding Duplicate Data in Oracle
提问by Ben Hubbard
I have a table with 500,000+ records, and fields for ID, first name, last name, and email address. What I'm trying to do is find rows where the first name AND last name are both duplicates (as in the same person has two separate IDs, email addresses, or whatever, they're in the table more than once). I think I know how to find the duplicates using GROUP BY, this is what I have:
我有一个包含 500,000 多条记录的表格,以及 ID、名字、姓氏和电子邮件地址的字段。我想要做的是找到名字和姓氏都重复的行(因为同一个人有两个单独的 ID、电子邮件地址或其他任何东西,它们在表中不止一次)。我想我知道如何使用 GROUP BY 查找重复项,这就是我所拥有的:
SELECT first_name, last_name, COUNT(*)
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
The problem is that I need to then move the entire row with these duplicated names into a different table. Is there a way to find the duplicates and get the whole row? Or at least to get the IDs as well? I tried using a self-join, but got back more rows than were in the table to begin with. Would that be a better approach? Any help would be greatly appreciated.
问题是我需要将具有这些重复名称的整行移动到不同的表中。有没有办法找到重复项并获取整行?或者至少也能获得 ID?我尝试使用自联接,但返回的行数比表中的行数多。那会是更好的方法吗?任何帮助将不胜感激。
回答by Ollie
The most effective way to remove duplicate rows is with a self-join:
删除重复行的最有效方法是使用自联接:
DELETE FROM person_table a
WHERE a.rowid >
ANY (SELECT b.rowid
FROM person_table b
WHERE a.first_name = b.first_name
AND a.last_name = b.last_name);
This will remove all duplicates even if there are more than one duplicate row.
即使有多个重复行,这也将删除所有重复项。
There is more on removing duplicates and differing methods here: http://www.dba-oracle.com/t_delete_duplicate_table_rows.htm
这里有更多关于删除重复项和不同方法的信息:http: //www.dba-oracle.com/t_delete_duplicate_table_rows.htm
Hope it helps...
希望能帮助到你...
EDIT: As per your comments, if you want to select all but one of the duplicates then
编辑:根据您的评论,如果您想选择除一个重复项之外的所有项,那么
SELECT *
FROM person_table a
WHERE a.rowid >
ANY (SELECT b.rowid
FROM person_table b
WHERE a.first_name = b.first_name
AND a.last_name = b.last_name);
回答by ypercube??
An index on (first_name, last_name)
or on (last_name, first_name)
would help:
上(first_name, last_name)
或上的索引(last_name, first_name)
会有所帮助:
SELECT t.*
FROM
person_table t
JOIN
( SELECT first_name, last_name
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
) dup
ON dup.last_name = t.last_name
AND dup.first_name = t.first_name
or:
或者:
SELECT t.*
FROM person_table t
WHERE EXISTS
( SELECT *
FROM person_table dup
WHERE dup.last_name = t.last_name
AND dup.first_name = t.first_name
AND dup.ID <> t.ID
)
回答by Aitor
To add another option, I usually use this one to remove duplicates:
要添加另一个选项,我通常使用这个来删除重复项:
delete from person_table
where rowid in (select rid
from (select rowid rid, row_number() over
(partition by first_name,last_name order by rowid) rn
from person_table
)
where rn <> 1 )
回答by michael667
This will give you an ID you want to move/delete/etc. Note that it does not work if count(*) > 2, as you get only 1 ID (you could re-run your query for these cases).
这将为您提供要移动/删除/等的 ID。请注意,如果 count(*) > 2,它不起作用,因为您只获得 1 个 ID(您可以针对这些情况重新运行查询)。
SELECT max(ID), first_name, last_name, COUNT(*)
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
Edit:You can use COLLECTto get all IDs at once (but be careful, as you only want to move/delete all but one)
编辑:您可以使用COLLECT一次获取所有 ID(但要小心,因为您只想移动/删除除一个之外的所有 ID )