在 Oracle 中查找重复数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7771552/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 00:22:28  来源:igfitidea点击:

Finding Duplicate Data in Oracle

databaseoracleduplicatesoracle11g

提问by Ben Hubbard

I have a table with 500,000+ records, and fields for ID, first name, last name, and email address. What I'm trying to do is find rows where the first name AND last name are both duplicates (as in the same person has two separate IDs, email addresses, or whatever, they're in the table more than once). I think I know how to find the duplicates using GROUP BY, this is what I have:

我有一个包含 500,000 多条记录的表格,以及 ID、名字、姓氏和电子邮件地址的字段。我想要做的是找到名字和姓氏都重复的行(因为同一个人有两个单独的 ID、电子邮件地址或其他任何东西,它们在表中不止一次)。我想我知道如何使用 GROUP BY 查找重复项,这就是我所拥有的:

SELECT first_name, last_name, COUNT(*)
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1

The problem is that I need to then move the entire row with these duplicated names into a different table. Is there a way to find the duplicates and get the whole row? Or at least to get the IDs as well? I tried using a self-join, but got back more rows than were in the table to begin with. Would that be a better approach? Any help would be greatly appreciated.

问题是我需要将具有这些重复名称的整行移动到不同的表中。有没有办法找到重复项并获取整行?或者至少也能获得 ID?我尝试使用自联接,但返回的行数比表中的行数多。那会是更好的方法吗?任何帮助将不胜感激。

回答by Ollie

The most effective way to remove duplicate rows is with a self-join:

删除重复行的最有效方法是使用自联接:

DELETE FROM person_table a
 WHERE a.rowid > 
       ANY (SELECT b.rowid
              FROM person_table b
             WHERE a.first_name = b.first_name
               AND a.last_name  = b.last_name);

This will remove all duplicates even if there are more than one duplicate row.

即使有多个重复行,这也将删除所有重复项。

There is more on removing duplicates and differing methods here: http://www.dba-oracle.com/t_delete_duplicate_table_rows.htm

这里有更多关于删除重复项和不同方法的信息:http: //www.dba-oracle.com/t_delete_duplicate_table_rows.htm

Hope it helps...

希望能帮助到你...

EDIT: As per your comments, if you want to select all but one of the duplicates then

编辑:根据您的评论,如果您想选择除一个重复项之外的所有项,那么

SELECT *
  FROM person_table a
 WHERE a.rowid > 
       ANY (SELECT b.rowid
              FROM person_table b
             WHERE a.first_name = b.first_name
               AND a.last_name  = b.last_name);

回答by ypercube??

An index on (first_name, last_name)or on (last_name, first_name)would help:

(first_name, last_name)或上的索引(last_name, first_name)会有所帮助:

SELECT t.*
FROM 
    person_table t
  JOIN      
      ( SELECT first_name, last_name
        FROM person_table
        GROUP BY first_name, last_name
        HAVING COUNT(*) > 1
      ) dup
    ON  dup.last_name = t.last_name
    AND dup.first_name = t.first_name

or:

或者:

SELECT t.*
FROM person_table t
WHERE EXISTS
      ( SELECT *
        FROM person_table dup
        WHERE dup.last_name = t.last_name
          AND dup.first_name = t.first_name
          AND dup.ID <> t.ID
      )

回答by Aitor

To add another option, I usually use this one to remove duplicates:

要添加另一个选项,我通常使用这个来删除重复项:

delete from person_table
where rowid in (select rid
                  from (select rowid rid, row_number() over
                         (partition by first_name,last_name order by rowid) rn
                          from person_table
                        )
                 where rn <> 1 )

回答by michael667

This will give you an ID you want to move/delete/etc. Note that it does not work if count(*) > 2, as you get only 1 ID (you could re-run your query for these cases).

这将为您提供要移动/删除/等的 ID。请注意,如果 count(*) > 2,它不起作用,因为您只获得 1 个 ID(您可以针对这些情况重新运行查询)。

SELECT max(ID), first_name, last_name, COUNT(*)
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1

Edit:You can use COLLECTto get all IDs at once (but be careful, as you only want to move/delete all but one)

编辑:您可以使用COLLECT一次获取所有 ID(但要小心,因为您只想移动/删除除一个之外的所有 ID )