T-SQL:删除所有重复行但保留一个
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6025367/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
T-SQL: Deleting all duplicate rows but keeping one
提问by nuit9
I have a table with a very large amount of rows. Duplicates are not allowed but due to a problem with how the rows were created I know there are some duplicates in this table. I need to eliminate the extra rows from the perspective of the key columns. Some other columns may have slightlydifferent data but I do not care about that. I still need to keep one of these rows however. SELECT DISTINCT won't work because it operates on all columns and I need to suppress duplicates based on the key columns.
我有一个包含大量行的表。不允许重复,但由于行的创建方式存在问题,我知道此表中有一些重复。我需要从关键列的角度消除额外的行。其他一些列的数据可能略有不同,但我不在乎。但是,我仍然需要保留这些行之一。SELECT DISTINCT 不起作用,因为它对所有列进行操作,我需要根据键列抑制重复项。
How can I delete the extra rows but still keep one efficiently?
如何删除多余的行,但仍然有效地保留一行?
回答by Ben Thul
You didn't say what version you were using, but in SQL 2005 and above, you can use a common table expression with the OVER Clause. It goes a little something like this:
您没有说明您使用的是什么版本,但在 SQL 2005 及更高版本中,您可以使用带有OVER 子句的公共表表达式。它有点像这样:
WITH cte AS (
SELECT[foo], [bar],
row_number() OVER(PARTITION BY foo, bar ORDER BY baz) AS [rn]
FROM TABLE
)
DELETE cte WHERE [rn] > 1
Play around with it and see what you get.
玩弄它,看看你会得到什么。
(Edit: In an attempt to be helpful, someone edited the ORDER BY
clause within the CTE. To be clear, you can order by anything you want here, it needn't be one of the columns returned by the cte. In fact, a common use-case here is that "foo, bar" are the group identifier and "baz" is some sort of time stamp. In order to keep the latest, you'd do ORDER BY baz desc
)
(编辑:为了提供帮助,有人编辑了ORDER BY
CTE 中的子句。要清楚,您可以在此处按任何您想要的顺序进行排序,它不必是 CTE 返回的列之一。实际上,一个常见的这里的用例是“foo,bar”是组标识符,而“baz”是某种时间戳。为了保持最新,你会这样做ORDER BY baz desc
)
回答by jams
Example query:
示例查询:
DELETE FROM Table
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM Table
GROUP BY Field1, Field2, Field3, ...
)
Here fields
are column on which you want to group the duplicate rows.
以下fields
是您想要对重复行进行分组的列。
回答by C???
Here's my twist on it, with a runnable example. Notethis will only work in the situation where Id
is unique, and you have duplicate values in other columns.
这是我的转折点,有一个可运行的例子。请注意,这仅适用于唯一的情况Id
,并且您在其他列中有重复的值。
DECLARE @SampleData AS TABLE (Id int, Duplicate varchar(20))
INSERT INTO @SampleData
SELECT 1, 'ABC' UNION ALL
SELECT 2, 'ABC' UNION ALL
SELECT 3, 'LMN' UNION ALL
SELECT 4, 'XYZ' UNION ALL
SELECT 5, 'XYZ'
DELETE FROM @SampleData WHERE Id IN (
SELECT Id FROM (
SELECT
Id
,ROW_NUMBER() OVER (PARTITION BY [Duplicate] ORDER BY Id) AS [ItemNumber]
-- Change the partition columns to include the ones that make the row distinct
FROM
@SampleData
) a WHERE ItemNumber > 1 -- Keep only the first unique item
)
SELECT * FROM @SampleData
And the results:
结果:
Id Duplicate
----------- ---------
1 ABC
3 LMN
4 XYZ
Not sure why that's what I thought of first... definitely not the simplest way to go but it works.
不知道为什么这是我首先想到的......绝对不是最简单的方法,但它有效。