T-SQL:删除所有重复行但保留一个

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6025367/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 10:35:03  来源:igfitidea点击:

T-SQL: Deleting all duplicate rows but keeping one

sqlsql-servertsql

提问by nuit9

I have a table with a very large amount of rows. Duplicates are not allowed but due to a problem with how the rows were created I know there are some duplicates in this table. I need to eliminate the extra rows from the perspective of the key columns. Some other columns may have slightlydifferent data but I do not care about that. I still need to keep one of these rows however. SELECT DISTINCT won't work because it operates on all columns and I need to suppress duplicates based on the key columns.

我有一个包含大量行的表。不允许重复,但由于行的创建方式存在问题,我知道此表中有一些重复。我需要从关键列的角度消除额外的行。其他一些列的数据可能略有不同,但我不在乎。但是,我仍然需要保留这些行之一。SELECT DISTINCT 不起作用,因为它对所有列进行操作,我需要根据键列抑制重复项。

How can I delete the extra rows but still keep one efficiently?

如何删除多余的行,但仍然有效地保留一行?

回答by Ben Thul

You didn't say what version you were using, but in SQL 2005 and above, you can use a common table expression with the OVER Clause. It goes a little something like this:

您没有说明您使用的是什么版本,但在 SQL 2005 及更高版本中,您可以使用带有OVER 子句的公共表表达式。它有点像这样:

WITH cte AS (
  SELECT[foo], [bar], 
     row_number() OVER(PARTITION BY foo, bar ORDER BY baz) AS [rn]
  FROM TABLE
)
DELETE cte WHERE [rn] > 1

Play around with it and see what you get.

玩弄它,看看你会得到什么。

(Edit: In an attempt to be helpful, someone edited the ORDER BYclause within the CTE. To be clear, you can order by anything you want here, it needn't be one of the columns returned by the cte. In fact, a common use-case here is that "foo, bar" are the group identifier and "baz" is some sort of time stamp. In order to keep the latest, you'd do ORDER BY baz desc)

(编辑:为了提供帮助,有人编辑了ORDER BYCTE 中的子句。要清楚,您可以在此处按任何您想要的顺序进行排序,它不必是 CTE 返回的列之一。实际上,一个常见的这里的用例是“foo,bar”是组标识符,而“baz”是某种时间戳。​​为了保持最新,你会这样做ORDER BY baz desc

回答by jams

Example query:

示例查询:

DELETE FROM Table
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM Table
GROUP BY Field1, Field2, Field3, ...
)

Here fieldsare column on which you want to group the duplicate rows.

以下fields是您想要对重复行进行分组的列。

回答by C???

Here's my twist on it, with a runnable example. Notethis will only work in the situation where Idis unique, and you have duplicate values in other columns.

这是我的转折点,有一个可运行的例子。请注意,这仅适用于唯一的情况Id,并且您在其他列中有重复的值。

DECLARE @SampleData AS TABLE (Id int, Duplicate varchar(20))

INSERT INTO @SampleData
SELECT 1, 'ABC' UNION ALL
SELECT 2, 'ABC' UNION ALL
SELECT 3, 'LMN' UNION ALL
SELECT 4, 'XYZ' UNION ALL
SELECT 5, 'XYZ'

DELETE FROM @SampleData WHERE Id IN (
    SELECT Id FROM (
        SELECT 
            Id
            ,ROW_NUMBER() OVER (PARTITION BY [Duplicate] ORDER BY Id) AS [ItemNumber]
            -- Change the partition columns to include the ones that make the row distinct
        FROM 
            @SampleData
    ) a WHERE ItemNumber > 1 -- Keep only the first unique item
)

SELECT * FROM @SampleData

And the results:

结果:

Id          Duplicate
----------- ---------
1           ABC
3           LMN
4           XYZ

Not sure why that's what I thought of first... definitely not the simplest way to go but it works.

不知道为什么这是我首先想到的......绝对不是最简单的方法,但它有效。