SQL 从 Oracle 表中删除重复行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/529098/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 01:04:00  来源:igfitidea点击:

Removing duplicate rows from table in Oracle

sqloracleduplicatesdelete-row

提问by juan

I'm testing something in Oracle and populated a table with some sample data, but in the process I accidentally loaded duplicate records, so now I can't create a primary key using some of the columns.

我正在 Oracle 中测试某些内容并使用一些示例数据填充表,但在此过程中我不小心加载了重复记录,因此现在我无法使用某些列创建主键。

How can I delete all duplicate rows and leave only one of them?

如何删除所有重复行并只保留其中之一?

回答by Bill the Lizard

Use the rowidpseudocolumn.

使用rowid伪列。

DELETE FROM your_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM your_table
GROUP BY column1, column2, column3);

Where column1, column2, and column3make up the identifying key for each record. You might list all your columns.

其中column1column2、 和column3组成每条记录的标识键。您可能会列出所有列。

回答by Dead Programmer

From Ask Tom

问汤姆

delete from t
 where rowid IN ( select rid
                    from (select rowid rid, 
                                 row_number() over (partition by 
                         companyid, agentid, class , status, terminationdate
                                   order by rowid) rn
                            from t)
                   where rn <> 1);

(fixed the missing parenthesis)

(修正了缺少的括号)

回答by Dead Programmer

DELETE FROM tablename a
      WHERE a.ROWID > ANY (SELECT b.ROWID
                             FROM tablename b
                            WHERE a.fieldname = b.fieldname
                              AND a.fieldname2 = b.fieldname2)

回答by Mark

From DevX.com:

来自DevX.com

DELETE FROM our_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM our_table
GROUP BY column1, column2, column3...) ;

Where column1, column2, etc. is the key you want to use.

其中 column1、column2 等是您要使用的键。

回答by DoOrDie

Solution 1)

解决方案 1)

delete from emp
where rowid not in
(select max(rowid) from emp group by empno);

Solution 2)

解决方案2)

delete from emp where rowid in
               (
                 select rid from
                  (
                    select rowid rid,
                      row_number() over(partition by empno order by empno) rn
                      from emp
                  )
                where rn > 1
               );

Solution 3)

解决方案 3)

delete from emp e1
         where rowid not in
          (select max(rowid) from emp e2
           where e1.empno = e2.empno ); 

回答by Mohammed khaled

create table t2 as select distinct * from t1;

创建表 t2 作为 select distinct * from t1;

回答by Nick

You should do a small pl/sql block using a cursor for loop and delete the rows you don't want to keep. For instance:

您应该使用游标 for 循环执行一个小的 pl/sql 块并删除您不想保留的行。例如:

declare
prev_var my_table.var1%TYPE;

begin

for t in (select var1 from my_table order by var 1) LOOP

-- if previous var equal current var, delete the row, else keep on going.
end loop;

end;

回答by user1799846

To select the duplicates only the query format can be:

要选择重复项,只有查询格式可以是:

SELECT GroupFunction(column1), GroupFunction(column2),..., 
COUNT(column1), column1, column2...
FROM our_table
GROUP BY column1, column2, column3...
HAVING COUNT(column1) > 1

So the correct query as per other suggestion is:

因此,根据其他建议的正确查询是:

DELETE FROM tablename a
      WHERE a.ROWID > ANY (SELECT b.ROWID
                             FROM tablename b
                            WHERE a.fieldname = b.fieldname
                              AND a.fieldname2 = b.fieldname2
                              AND ....so on.. to identify the duplicate rows....)

This query will keep the oldest record in the database for the criteria chosen in the WHERE CLAUSE.

此查询将保留数据库中针对WHERE CLAUSE.

Oracle Certified Associate (2008)

Oracle 认证助理(2008 年)

回答by user2158672

The Fastest way for really big tables

真正大桌子的最快方法

  1. Create exception table with structure below: exceptions_table

    ROW_ID ROWID
    OWNER VARCHAR2(30)
    TABLE_NAME VARCHAR2(30)
    CONSTRAINT VARCHAR2(30)
    
  2. Try create a unique constraint or primary key which will be violated by the duplicates. You will get an error message because you have duplicates. The exceptions table will contain the rowids for the duplicate rows.

    alter table add constraint
    unique --or primary key
    (dupfield1,dupfield2) exceptions into exceptions_table;
    
  3. Join your table with exceptions_table by rowid and delete dups

    delete original_dups where rowid in (select ROW_ID from exceptions_table);
    
  4. If the amount of rows to delete is big, then create a new table (with all grants and indexes) anti-joining with exceptions_table by rowid and rename the original table into original_dups table and rename new_table_with_no_dups into original table

    create table new_table_with_no_dups AS (
        select field1, field2 ........ 
        from original_dups t1
        where not exists ( select null from exceptions_table T2 where t1.rowid = t2.row_id )
    )
    
  1. 创建具有以下结构的异常表:exceptions_table

    ROW_ID ROWID
    OWNER VARCHAR2(30)
    TABLE_NAME VARCHAR2(30)
    CONSTRAINT VARCHAR2(30)
    
  2. 尝试创建一个唯一约束或主键,重复项将违反该约束或主键。您将收到一条错误消息,因为您有重复项。例外表将包含重复行的 rowid。

    alter table add constraint
    unique --or primary key
    (dupfield1,dupfield2) exceptions into exceptions_table;
    
  3. 通过 rowid 将您的表与 exceptions_table 连接起来并删除重复项

    delete original_dups where rowid in (select ROW_ID from exceptions_table);
    
  4. 如果要删除的行数很大,则创建一个新表(包含所有授权和索引)通过rowid与exceptions_table反连接并将原始表重命名为original_dups表并将new_table_with_no_dups重命名为原始表

    create table new_table_with_no_dups AS (
        select field1, field2 ........ 
        from original_dups t1
        where not exists ( select null from exceptions_table T2 where t1.rowid = t2.row_id )
    )
    

回答by Dnyaneshwar Tandale

Using rowid-

使用 rowid-

delete from emp
 where rowid not in
 (select max(rowid) from emp group by empno);

Using self join-

使用自连接

delete from emp e1
 where rowid not in
 (select max(rowid) from emp e2
 where e1.empno = e2.empno );