删除 SQL Server 中的重复记录?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3317433/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 06:55:44  来源:igfitidea点击:

Delete duplicate records in SQL Server?

sqltsqlduplicatesdelete-row

提问by usr021986

Consider a column named EmployeeNametable Employee. The goal is to delete repeated records, based on the EmployeeNamefield.

考虑一个名为EmployeeNametable的列Employee。目标是根据EmployeeName字段删除重复记录。

EmployeeName
------------
Anand
Anand
Anil
Dipak
Anil
Dipak
Dipak
Anil

Using one query, I want to delete the records which are repeated.

使用一个查询,我想删除重复的记录。

How can this be done with TSQL in SQL Server?

如何使用 SQL Server 中的 TSQL 完成此操作?

回答by John Gibb

You can do this with window functions. It will order the dupes by empId, and delete all but the first one.

你可以用窗口函数来做到这一点。它将按 empId 对欺骗进行排序,并删除除第一个之外的所有内容。

delete x from (
  select *, rn=row_number() over (partition by EmployeeName order by empId)
  from Employee 
) x
where rn > 1;

Run it as a select to see what would be deleted:

将其作为选择运行以查看将删除的内容:

select *
from (
  select *, rn=row_number() over (partition by EmployeeName order by empId)
  from Employee 
) x
where rn > 1;

回答by StuartLC

Assuming that your Employee table also has a unique column (IDin the example below), the following will work:

假设您的 Employee 表也有一个唯一的列(ID在下面的示例中),以下将起作用:

delete from Employee 
where ID not in
(
    select min(ID)
    from Employee 
    group by EmployeeName 
);

This will leave the version with the lowest ID in the table.

这将保留表中具有最低 ID 的版本。

Edit
Re McGyver's comment - as of SQL 2012

编辑
Re McGyver 的评论 - 从SQL 2012 开始

MINcan be used with numeric, char, varchar, uniqueidentifier, or datetime columns, but not with bit columns

MIN可以与数字、字符、varchar、uniqueidentifier 或日期时间列一起使用,但不能与位列一起使用

For 2008 R2and earlier,

对于2008 R2及更早版本,

MIN can be used with numeric, char, varchar, or datetime columns, but not with bit columns (and it also doesn't work with GUID's)

MIN 可用于数字、字符、varchar 或日期时间列,但不能用于位列(并且它也不适用于 GUID)

For 2008R2 you'll need to cast the GUIDto a type supported by MIN, e.g.

对于 2008R2,您需要将GUID转换为 支持的类型MIN,例如

delete from GuidEmployees
where CAST(ID AS binary(16)) not in
(
    select min(CAST(ID AS binary(16)))
    from GuidEmployees
    group by EmployeeName 
);

SqlFiddle for various types in Sql 2008

SqlFiddle 用于 Sql 2008 中的各种类型

SqlFiddle for various types in Sql 2012

SqlFiddle 用于 Sql 2012 中的各种类型

回答by Ben Cawley

You could try something like the following:

您可以尝试以下操作:

delete T1
from MyTable T1, MyTable T2
where T1.dupField = T2.dupField
and T1.uniqueField > T2.uniqueField  

(this assumes that you have an integer based unique field)

(这假设您有一个基于整数的唯一字段)

Personally though I'd say you were better off trying to correct the fact that duplicate entries are being added to the database before it occurs rather than as a post fix-it operation.

就我个人而言,尽管我会说你最好尝试纠正这样一个事实,即重复条目在发生之前被添加到数据库中,而不是作为后修复操作。

回答by Mostafa Elmoghazi

WITH CTE AS
(
   SELECT EmployeeName, 
          ROW_NUMBER() OVER(PARTITION BY EmployeeName ORDER BY EmployeeName) AS R
   FROM employee_table
)
DELETE CTE WHERE R > 1;

The magic of common table expressions.

常用表表达式的神奇之处。

回答by Kumar Manish

DELETE
FROM MyTable
WHERE ID NOT IN (
     SELECT MAX(ID)
     FROM MyTable
     GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)


WITH TempUsers (FirstName, LastName, duplicateRecordCount)
AS
(
    SELECT FirstName, LastName,
    ROW_NUMBER() OVER (PARTITIONBY FirstName, LastName ORDERBY FirstName) AS duplicateRecordCount
    FROM dbo.Users
)
DELETE
FROM TempUsers
WHERE duplicateRecordCount > 1

回答by Anurag Garg

Try

尝试

DELETE
FROM employee
WHERE rowid NOT IN (SELECT MAX(rowid) FROM employee
GROUP BY EmployeeName);

回答by Peter

If you're looking for a way to remove duplicates, yet you have a foreign key pointing to the table with duplicates, you could take the following approach using a slow yet effective cursor.

如果您正在寻找一种删除重复项的方法,但您有一个指向包含重复项的表的外键,则可以使用缓慢但有效的游标来采用以下方法。

It will relocate the duplicate keys on the foreign key table.

它将重新定位外键表上的重复键。

create table #properOlvChangeCodes(
    id int not null,
    name nvarchar(max) not null
)

DECLARE @name VARCHAR(MAX);
DECLARE @id INT;
DECLARE @newid INT;
DECLARE @oldid INT;

DECLARE OLVTRCCursor CURSOR FOR SELECT id, name FROM Sales_OrderLineVersionChangeReasonCode; 
OPEN OLVTRCCursor;
FETCH NEXT FROM OLVTRCCursor INTO @id, @name;
WHILE @@FETCH_STATUS = 0  
BEGIN  
        -- determine if it should be replaced (is already in temptable with name)
        if(exists(select * from #properOlvChangeCodes where Name=@name)) begin
            -- if it is, finds its id
            Select  top 1 @newid = id
            from    Sales_OrderLineVersionChangeReasonCode
            where   Name = @name

            -- replace terminationreasoncodeid in olv for the new terminationreasoncodeid
            update Sales_OrderLineVersion set ChangeReasonCodeId = @newid where ChangeReasonCodeId = @id

            -- delete the record from the terminationreasoncode
            delete from Sales_OrderLineVersionChangeReasonCode where Id = @id
        end else begin
            -- insert into temp table if new
            insert into #properOlvChangeCodes(Id, name)
            values(@id, @name)
        end

        FETCH NEXT FROM OLVTRCCursor INTO @id, @name;
END;
CLOSE OLVTRCCursor;
DEALLOCATE OLVTRCCursor;

drop table #properOlvChangeCodes

回答by ohsoifelse

delete from person 
where ID not in
(
        select t.id from 
        (select min(ID) as id from person 
         group by email 
        ) as t
);

回答by Jithin Shaji

Please see the below way of deletion too.

请参阅下面的删除方式。

Declare @Employee table (EmployeeName varchar(10))

Insert into @Employee values 
('Anand'),('Anand'),('Anil'),('Dipak'),
('Anil'),('Dipak'),('Dipak'),('Anil')

Select * from @Employee

enter image description here

enter image description here

Created a sample table named @Employeeand loaded it with given data.

创建了一个示例表@Employee,并使用给定的数据加载它。

Delete  aliasName from (
Select  *,
        ROW_NUMBER() over (Partition by EmployeeName order by EmployeeName) as rowNumber
From    @Employee) aliasName 
Where   rowNumber > 1

Select * from @Employee

Result:

结果:

enter image description here

enter image description here

I know, this is asked six years ago, posting just incase it is helpful for anyone.

我知道,这是六年前问的,发布只是为了对任何人都有帮助。

回答by Daniel Marcus

Here's a nice way of deduplicating records in a table that has an identity column based on a desired primary key that you can define at runtime. Before I start I'll populate a sample data set to work with using the following code:

这是对表中的记录进行重复数据删除的一种好方法,该表具有基于您可以在运行时定义的所需主键的标识列。在开始之前,我将使用以下代码填充要使用的示例数据集:

if exists (select 1 from sys.all_objects where type='u' and name='_original')
drop table _original

declare @startyear int = 2017
declare @endyear int = 2018
declare @iterator int = 1
declare @income money = cast((SELECT round(RAND()*(5000-4990)+4990 , 2)) as money)
declare @salesrepid int = cast(floor(rand()*(9100-9000)+9000) as varchar(4))
create table #original (rowid int identity, monthyear varchar(max), salesrepid int, sale money)
while @iterator<=50000 begin
insert #original 
select (Select cast(floor(rand()*(@endyear-@startyear)+@startyear) as varchar(4))+'-'+ cast(floor(rand()*(13-1)+1) as varchar(2)) ),  @salesrepid , @income
set  @salesrepid  = cast(floor(rand()*(9100-9000)+9000) as varchar(4))
set @income = cast((SELECT round(RAND()*(5000-4990)+4990 , 2)) as money)
set @iterator=@iterator+1
end  
update #original
set monthyear=replace(monthyear, '-', '-0') where  len(monthyear)=6

select * into _original from #original

Next I'll create a Type called ColumnNames:

接下来,我将创建一个名为 ColumnNames 的类型:

create type ColumnNames AS table   
(Columnnames varchar(max))

Finally I will create a stored proc with the following 3 caveats: 1. The proc will take a required parameter @tablename that defines the name of the table you are deleting from in your database. 2. The proc has an optional parameter @columns that you can use to define the fields that make up the desired primary key that you are deleting against. If this field is left blank, it is assumed that all the fields besides the identity column make up the desired primary key. 3. When duplicate records are deleted, the record with the lowest value in it's identity column will be maintained.

最后,我将创建一个带有以下 3 个注意事项的存储过程: 1. 过程将采用必需的参数 @tablename,该参数定义了您要从数据库中删除的表的名称。2. proc 有一个可选参数@columns,您可以使用它来定义构成要删除的所需主键的字段。如果此字段留空,则假定除标识列之外的所有字段都构成所需的主键。3. 删除重复记录时,将保留其标识列中值最低的记录。

Here is my delete_dupes stored proc:

这是我的 delete_dupes 存储过程:

 create proc delete_dupes (@tablename varchar(max), @columns columnnames readonly) 
 as
 begin

declare @table table (iterator int, name varchar(max), is_identity int)
declare @tablepartition table (idx int identity, type varchar(max), value varchar(max))
declare @partitionby varchar(max)  
declare @iterator int= 1 


if exists (select 1 from @columns)  begin
declare @columns1 table (iterator int, columnnames varchar(max))
insert @columns1
select 1, columnnames from @columns
set @partitionby = (select distinct 
                substring((Select ', '+t1.columnnames 
                From @columns1 t1
                Where T1.iterator = T2.iterator
                ORDER BY T1.iterator
                For XML PATH ('')),2, 1000)  partition
From @columns1 T2 )

end

insert @table 
select 1, a.name, is_identity from sys.all_columns a join sys.all_objects b on a.object_id=b.object_id
where b.name = @tablename  

declare @identity varchar(max)= (select name from @table where is_identity=1)

while @iterator>=0 begin 
insert @tablepartition
Select          distinct case when @iterator=1 then 'order by' else 'over (partition by' end , 
                substring((Select ', '+t1.name 
                From @table t1
                Where T1.iterator = T2.iterator and is_identity=@iterator
                ORDER BY T1.iterator
                For XML PATH ('')),2, 5000)  partition
From @table T2
set @iterator=@iterator-1
end 

declare @originalpartition varchar(max)

if @partitionby is null begin
select @originalpartition  = replace(b.value+','+a.type+a.value ,'over (partition by','')  from @tablepartition a cross join @tablepartition b where a.idx=2 and b.idx=1
select @partitionby = a.type+a.value+' '+b.type+a.value+','+b.value+') rownum' from @tablepartition a cross join @tablepartition b where a.idx=2 and b.idx=1
 end
 else
 begin
 select @originalpartition=b.value +','+ @partitionby from @tablepartition a cross join @tablepartition b where a.idx=2 and b.idx=1
 set @partitionby = (select 'OVER (partition by'+ @partitionby  + ' ORDER BY'+ @partitionby + ','+b.value +') rownum'
 from @tablepartition a cross join @tablepartition b where a.idx=2 and b.idx=1)
 end


exec('select row_number() ' + @partitionby +', '+@originalpartition+' into ##temp from '+ @tablename+'')


exec(
'delete a from _original a 
left join ##temp b on a.'+@identity+'=b.'+@identity+' and rownum=1  
where b.rownum is null')

drop table ##temp

end

Once this is complied, you can delete all your duplicate records by running the proc. To delete dupes without defining a desired primary key use this call:

完成后,您可以通过运行 proc 删除所有重复记录。要在不定义所需主键的情况下删除重复项,请使用以下调用:

exec delete_dupes '_original'

To delete dupes based on a defined desired primary key use this call:

要根据定义的所需主键删除重复项,请使用此调用:

declare @table1 as columnnames
insert @table1
values ('salesrepid'),('sale')
exec delete_dupes '_original' , @table1