从没有主键的 SQL 表中删除重复记录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/985384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 02:25:33  来源:igfitidea点击:

Delete duplicate records from a SQL table without a primary key

sqlsql-server-2005tsqlduplicate-removal

提问by Shyju

I have the below table with the below records in it

我有下表,其中包含以下记录

create table employee
(
 EmpId number,
 EmpName varchar2(10),
 EmpSSN varchar2(11)
);

insert into employee values(1, 'Hyman', '555-55-5555');
insert into employee values (2, 'Joe', '555-56-5555');
insert into employee values (3, 'Fred', '555-57-5555');
insert into employee values (4, 'Mike', '555-58-5555');
insert into employee values (5, 'Cathy', '555-59-5555');
insert into employee values (6, 'Lisa', '555-70-5555');
insert into employee values (1, 'Hyman', '555-55-5555');
insert into employee values (4, 'Mike', '555-58-5555');
insert into employee values (5, 'Cathy', '555-59-5555');
insert into employee values (6 ,'Lisa', '555-70-5555');
insert into employee values (5, 'Cathy', '555-59-5555');
insert into employee values (6, 'Lisa', '555-70-5555');

I dont have any primary key in this table .But i have the above records in my table already. I want to remove the duplicate records which has the same value in EmpId and EmpSSN fields.

我在这个表中没有任何主键。但我的表中已经有上述记录。我想删除在 EmpId 和 EmpSSN 字段中具有相同值的重复记录。

Ex : Emp id 5

例如:Emp id 5

Can any one help me to frame a query to delete those duplicate records

任何人都可以帮助我构建一个查询以删除那些重复记录

Thanks in advance

提前致谢

回答by Anjib Rajkhowa

It is very simple. I tried in SQL Server 2008

这很简单。我在 SQL Server 2008 中尝试过

DELETE SUB FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY EmpId, EmpName, EmpSSN ORDER BY EmpId) cnt
 FROM Employee) SUB
WHERE SUB.cnt > 1

回答by cjk

Add a Primary Key (code below)

添加主键(下面的代码)

Run the correct delete (code below)

运行正确的删除(下面的代码)

Consider WHY you woudln't want to keep that primary key.

考虑为什么您不想保留该主键。



Assuming MSSQL or compatible:

假设 MSSQL 或兼容:

ALTER TABLE Employee ADD EmployeeID int identity(1,1) PRIMARY KEY;

WHILE EXISTS (SELECT COUNT(*) FROM Employee GROUP BY EmpID, EmpSSN HAVING COUNT(*) > 1)
BEGIN
    DELETE FROM Employee WHERE EmployeeID IN 
    (
        SELECT MIN(EmployeeID) as [DeleteID]
        FROM Employee
        GROUP BY EmpID, EmpSSN
        HAVING COUNT(*) > 1
    )
END

回答by Paul Morgan

Use the row number to differentiate between duplicate records. Keep the first row number for an EmpID/EmpSSN and delete the rest:

使用行号来区分重复记录。保留 EmpID/EmpSSN 的第一行号并删除其余行号:

    DELETE FROM Employee a
     WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() )
                               FROM Employee b
                              WHERE a.EmpID  = b.EmpID
                                AND a.EmpSSN = b.EmpSSN )

回答by Nirav Parikh

With duplicates

As
(Select *, ROW_NUMBER() Over (PARTITION by EmpID,EmpSSN Order by EmpID,EmpSSN) as Duplicate From Employee)

delete From duplicates

Where Duplicate > 1 ;

This will update Table and remove all duplicates from the Table!

这将更新表并从表中删除所有重复项!

回答by naga vara prasad

select distinct * into newtablename from oldtablename

Now, the newtablenamewill have no duplicate records.

现在,newtablename将没有重复的记录。

Simply change the table name(newtablename) by pressing F2 in object explorer in sql server.

只需newtablename在 sql server 的对象资源管理器中按 F2更改表名()。

回答by Daren Thomas

You could create a temporary table #tempemployeecontaining a select distinctof your employeetable. Then delete from employee. Then insert into employee select from #tempemployee.

您可以创建一个临时表#tempemployee包含select distinct您的employee表。然后delete from employee。然后insert into employee select from #tempemployee

Like Josh said - even if you know the duplicates, deleting them will be impossile since you cannot actually refer to a specific record if it is an exact duplicate of another record.

就像乔希说的 - 即使你知道重复项,删除它们也是不可能的,因为如果它是另一条记录的完全重复,你实际上无法引用特定记录。

回答by kamz kamarajan

Code

代码

DELETE DUP 
FROM 
( 
    SELECT ROW_NUMBER() OVER (PARTITION BY Clientid ORDER BY Clientid ) AS Val 
    FROM ClientMaster 
) DUP 
WHERE DUP.Val > 1

Explanation

解释

Use an inner query to construct a view over the table which includes a field based on Row_Number(), partitioned by those columns you wish to be unique.

使用内部查询来构建表的视图,该视图包括一个基于 的字段Row_Number(),由您希望唯一的那些列分区。

Delete from the results of this inner query, selecting anything which does not have a row number of 1; i.e. the duplicates; not the original.

从这个内部查询的结果中删除,选择行号不为 1 的任何内容;即重复;不是原版。

The order byclause of the row_number window function is needed for a valid syntax; you can put any column name here. If you wish to change which of the results is treated as a duplicate (e.g. keep the earliest or most recent, etc), then the column(s) used here do matter; i.e. you want to specify the order such that the record you wish to keep will come first in the result.

order by有效语法需要 row_number 窗口函数的子句;您可以在此处输入任何列名称。如果您希望更改将哪些结果视为重复结果(例如保留最早的或最近的等),那么此处使用的列很重要;即您要指定顺序,以便您希望保留的记录在结果中排​​在第一位。

回答by Joe

If you don't want to create a new primary key you can use the TOP command in SQL Server:

如果不想创建新的主键,可以在 SQL Server 中使用 TOP 命令:

declare @ID int
while EXISTS(select count(*) from Employee group by EmpId having count(*)> 1)
begin
    select top 1 @ID = EmpId
    from Employee 
    group by EmpId
    having count(*) > 1

    DELETE TOP(1) FROM Employee WHERE EmpId = @ID
end

回答by Abhishek Jaiswal

ITS easy use below query

ITS 易于使用以下查询

WITH Dups AS
(
  SELECT col1,col2,col3,
ROW_NUMBER() OVER(PARTITION BY col1,col2,col3 ORDER BY (SELECT 0)) AS rn
 FROM mytable
)
DELETE FROM Dups WHERE rn > 1

回答by Sudhar P

delete sub from (select ROW_NUMBER() OVer(Partition by empid order by empid)cnt from employee)sub where sub.cnt>1

delete sub from (select ROW_NUMBER() Over(Partition by empid order by empid)cnt from employee)sub where sub.cnt>1