SQL 在单个查询中从 DB2 中的表中删除重复行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10087527/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 15:16:22  来源:igfitidea点击:

Removing duplicate rows from a table in DB2 in a single query

sqldb2sql-delete

提问by Vicky

I have a table with 3 columns as below:

我有一个包含 3 列的表,如下所示:

one   |   two    |  three  |   name
------------------------------------
 A1       B1          C1        xyz
 A1       B1          C1        pqr      -> should be deleted
 A1       B1          C1        lmn      -> should be deleted
 A2       B2          C2        abc
 A2       B2          C2        def      -> should be deleted
 A3       B3          C3        ghi
------------------------------------ 

The table is not having any primary key column. I do not have any control over the table and so I can not add any primary key column.

该表没有任何主键列。我对表没有任何控制权,所以我不能添加任何主键列。

As shown, I want to delete the rows where the combination of one, two and three column is same. So if A1B1C1 is occurring thrice (as in above e.g.), the other two should be deleted and only one should stay.

如图,我想删除一、二、三列组合相同的行。因此,如果 A1B1C1 出现三次(如上例所示),则应删除另外两个,只保留一个。

How to achieve this through just one query in DB2 ?

如何通过 DB2 中的一个查询来实现这一点?

My requirement is for a single query as I would be running it through a java program.

我的要求是单个查询,因为我将通过 Java 程序运行它。

回答by bhamby

(This assumes you're on DB2 for Linux/Unix/Windows, other platforms may vary slightly)

(这假设您使用的是 DB2 for Linux/Unix/Windows,其他平台可能略有不同)

DELETE FROM
    (SELECT ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN
     FROM SESSION.TEST) AS A
WHERE RN > 1;

Should get you what you're looking for.

应该得到你想要的东西。

The query uses the OLAP functionROWNUMBER()to assign a number for each row within each ONE, TWO, THREEcombination. DB2 is then able to match the rows referenced by the fullselect(A) as the rows that the DELETEstatementshould remove from the table. In order to be able to use a fullselectas the target for a delete clause, it has to match the rules for a deletable view(see "deletable view" under the notes section).

该查询使用OLAP 函数ROWNUMBER()为每个ONE, TWO,THREE组合中的每一行分配一个编号。然后,DB2 能够将fullselect(A)引用的行作为DELETE语句应从表中删除的行进行匹配。为了能够使用 afullselect作为删除子句的目标,它必须匹配可删除视图的规则(请参阅注释部分下的“可删除视图”)。

Below is some proof (tested on LUW 9.7):

以下是一些证明(在 LUW 9.7 上测试):

DECLARE GLOBAL TEMPORARY TABLE SESSION.TEST (
    one CHAR(2),
    two CHAR(2),
    three CHAR(2),
    name CHAR(3)
) ON COMMIT PRESERVE ROWS;

INSERT INTO SESSION.TEST VALUES 
    ('A1', 'B1', 'C1', 'xyz'),
    ('A1', 'B1', 'C1', 'pqr'),
    ('A1', 'B1', 'C1', 'lmn'),
    ('A2', 'B2', 'C2', 'abc'),
    ('A2', 'B2', 'C2', 'def'),
    ('A3', 'B3', 'C3', 'ghi');

DELETE FROM
    (SELECT ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN
     FROM SESSION.TEST) AS A
WHERE RN > 1;

SELECT * FROM SESSION.TEST;

Edit 2 March 2017:

2017 年 3 月 2 日编辑:

In response to the question from Ahmed Anwar, if you need to capture what was deleted, you can also combine the delete with a "data change statement". In this example you could do something like the following, which would give you the "rn" column, one, two, and three:

针对 Ahmed Anwar 的问题,如果需要捕获被删除的内容,也可以将删除与“数据更改语句”结合起来。在此示例中,您可以执行以下操作,这将为您提供“ rn”列,

SELECT * FROM OLD TABLE (
    DELETE FROM
        (SELECT 
             ROWNUMBER() OVER (PARTITION BY ONE, TWO, THREE) AS RN
            ,ONE
            ,TWO
            ,THREE
         FROM SESSION.TEST) AS A
    WHERE RN > 1
) OLD;

回答by wildplasser

DELETE FROM the_table tt
WHERE EXISTS ( SELECT *
    FROM the_table ex
    WHERE ex.one = tt.one
    AND ex.two = tt.two
    AND ex.three = tt.three
    AND ex.zname < tt.zname -- tie-breaker...
    );

Notes: your SQL-dialect may vary. Note2: "name" is a reserved word on some platforms. Better avoid it.

注意:您的 SQL 方言可能会有所不同。注2:“名称”在某些平台上是保留字。最好避免它。

回答by danny117

a variation of @a_horse_with_no_name answer db2 for iseries without using group by clause and in clause. It actually works

@a_horse_with_no_name answer db2 for iseries 的变体,不使用 group by 子句和 in 子句。它确实有效

DELETE from the_table a 
where rrn(a) < (
select max(rrn(a)) from the_table b 
where a.one = b.one and a.two = b.two and a.three = b.three
)

回答by Tom S.

For other using a very old version of db2 SQL: A combination of these posts helped identify and remove the dups from 2 batches posted twice.

对于其他使用非常旧版本的 db2 SQL:这些帖子的组合有助于识别和删除两次发布的 2 个批次中的重复项。

SELECT   * FROM     LIBRARY.TABLE a
WHERE    a.batch in (115131, 115287)
AND      EXISTS ( SELECT 1 from LIBRARY.TABLE d 
    WHERE d.batch in (115131, 115287)
     AND a.one = d.one AND a.two = d.two AND a.three = d.three 
    GROUP BY d.one, d.two, d.three 
    HAVING count(*) <> 1 )

    AND RRN(a) > (SELECT MIN(RRN(b)) FROM LIBRARY.TABLE b 
        WHERE b.batch in (115131, 115287)
        AND a.one = b.one AND a.two = b.two AND a.three = b.three );

回答by Sunil Chavan

Please take backup of table before deleting the data

Delete from table where Name in (select name from table
group by one,two,three
having count(*) > 2)

You can use

您可以使用

     DELETE from TABLE Group by one,two,three Having count(*) > 2; 

回答by levi

DELETE  FROM Table_Name
WHERE   Table_Name_ID NOT IN ( SELECT  MAX(Table_Name_ID)
                                    FROM    Table_Name
                                    GROUP BY one ,
                                             two, 
                                             three )

one two threee are your repeated columns and Table_Name_ID is PK

一二三是您的重复列,而 Table_Name_ID 是 PK

回答by a_horse_with_no_name

This is a variation of levenlevi's answer that does not require a primary key on the table (Can't test the syntax right now thow)

这是 levenlevi 答案的变体,不需要表上的主键(现在无法测试语法)

DELETE FROM the_table
WHERE  rid_bit(the_table) NOT IN (SELECT MAX(rid_bit(the_table))
                                  FROM the_table
                                  GROUP BY one,two,three)

I think on iSeries the rid_bit()is not supported, but rrn()save the same purpose

我认为在 iSeriesrid_bit()上不支持,但rrn()保存相同的目的