SQL 如何使用SQL查询从表中删除重复项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7676110/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 12:22:06  来源:igfitidea点击:

How to remove duplicates from table using SQL query

sqltsql

提问by user7

I have a table which is as follows:

我有一张表,如下所示:

emp_name   emp_address  sex  matial_status  
uuuu       eee          m    s
iiii       iii          f    s
uuuu       eee          m    s

I want to remove the duplicate entries based on 3 fields emp_name, emp_address and sex. and my resultant table (after removing the duplicates) should look like -

我想删除基于 3 个字段 emp_name、emp_address 和 sex 的重复条目。我的结果表(删除重复项后)应该看起来像 -

emp_name    emp_address   sex   marital_status
uuuu        eee           m     s
iiii        iii           f     s

I am not able to recall how to write a SQL Query for this. an anyone pls help?

我不记得如何为此编写 SQL 查询。有人请帮忙吗?

采纳答案by Roopesh Shenoy

It looks like all four column values are duplicated so you can do this -

看起来所有四列值都是重复的,因此您可以执行此操作 -

select distinct emp_name, emp_address, sex, marital_status
from YourTable

However if marital status can be different and you have some other column based on which to choose (for eg you want latest record based on a column create_date) you can do this

但是,如果婚姻状况可能不同,并且您有其他一些列可供选择(例如,您想要基于列 create_date 的最新记录),您可以执行此操作

select emp_name, emp_address, sex, marital_status
from YourTable a
where not exists (select 1 
                   from YourTable b
                  where b.emp_name = a.emp_name and
                        b.emp_address = a.emp_address and
                        b.sex = a.sex and
                        b.create_date >= a.create_date)

回答by Kusalananda

I would create a new table with a unique index over the columns that you want to keep unique. Then do an insert from the old table into the new, ignoring the warnings about duplicated rows. Lastly, I would drop (or rename) the old table and replace it with the new table. In MySQL, this would look like

我会在要保持唯一的列上创建一个具有唯一索引的新表。然后从旧表插入新表,忽略有关重复行的警告。最后,我会删除(或重命名)旧表并将其替换为新表。在 MySQL 中,这看起来像

CREATE TABLE tmp LIKE mytable;
ALTER TABLE tmp ADD UNIQUE INDEX myindex (emp_name, emp_address, sex, marital_status);
INSERT IGNORE INTO tmp SELECT * FROM mytable;
DROP TABLE mytable;
RENAME TABLE tmp TO mytable;

Or something similar (this is totally untested).

或类似的东西(这是完全未经测试的)。

回答by Mikael Eriksson

This is not a query but a delete statement. It will delete/remove duplicate rows from your table

这不是查询而是删除语句。它将从您的表中删除/删除重复的行

;with C as
(
  select row_number() over(partition by DUPLICATE_VAARS_DECISION 
                           order by NODE_EQ_NO) as rn
  from yourtable
)
delete C
where rn > 1

If you are only interested in querying the table and get the non duplicates as a result you should use this instead.

如果您只对查询表感兴趣并因此获得非重复项,则应改用它。

;with C as
(
  select *,
         row_number() over(partition by DUPLICATE_VAARS_DECISION 
                           order by NODE_EQ_NO) as rn
  from yourtable
)
select *
from C
where rn = 1

回答by SQLMenace

one way

单程

select emp_name,   emp_address,  sex,  max(marital_status) as marital_status
from Yourtable
group by emp_name,   emp_address,  sex

Since I don't know what you want, I used max for the marital status

由于我不知道你想要什么,我用max作为婚姻状况

See also Including an Aggregated Column's Related Valuesfor more examples

有关更多示例,另请参阅包括聚合列的相关值

回答by Zasz

If you are okay with trading space for performance and simplicity then the duplicates in emp_name | emp_address | sexcombo can be eliminated, by the introduction of a calculated/derived column using CHECKSUM()TSQL method and DISTINCTkeyword while querying.

如果您对性能和简单性的交易空间没有意见,那么emp_name | emp_address | sex可以通过在查询时使用CHECKSUM()TSQL 方法和DISTINCT关键字引入计算/派生列来消除组合中的重复项。

Heres an example of CHECKSUM :

这是 CHECKSUM 的示例:

SELECT CHECKSUM(*) FROM HumanResources.Employee WHERE EmployeeID = 2

Google around and create a dependent column that contains the checksum of the 3 columns. Then you can select distinct rows by looking at this question

谷歌并创建一个包含 3 列校验和的依赖列。然后您可以通过查看此问题来选择不同的行

回答by Tank Liu

The best answer is here:
Use this SQL statement to identify the extra duplicated rows:

最佳答案在这里:
使用此 SQL 语句来识别额外的重复行:

 select * from Employee a 
where %%physloc%% >
(select min(%%physloc%%) from Employee b
where a.emp_name=b.emp_name and a.emp_address=b.emp_address and a.sex=b.sex);

you will get the extra row:

你会得到额外的一行:

uuuu   eee m   s 


Use this SQL statement to delete the extra duplicated rows:


使用此 SQL 语句删除多余的重复行:

 delete from Employee a 
where %%physloc%% >
(select min(%%physloc%%) from Employee b
where a.emp_name=b.emp_name and a.emp_address=b.emp_address and a.sex=b.sex);


For all duplicated records, only the one with lowest physical location is kept. This method can be applied to remove all kinds of duplicated rows.


对于所有重复的记录,只保留物理位置最低的记录。此方法可用于删除各种重复行。

I am assuming that you use MS SQL Server. If you are using Oracle DB, then you can just replace '%%physloc%%' with 'rowid'

我假设您使用 MS SQL Server。如果您使用的是 Oracle DB,那么您可以将“ %%physloc%%”替换为“ rowid

Enjoy the code!

享受代码!

回答by Shahadat Hossain Khan

I know this is old post, but recently I tested a solution and want to share if any one can find my solution helpful -

我知道这是旧帖子,但最近我测试了一个解决方案,如果有人觉得我的解决方案有帮助,我想分享一下 -

CREATE TABLE tmpTableLIKE yourTable; insert into tmpTable(col1, col2... colN) SELECT distinct col1, col2... colNFROM yourTableWHERE 1; drop table yourTable; RENAME TABLE tmpTableTO yourTable;

创建表tmpTable喜欢yourTable; 插入tmpTable( col1, col2... colN) SELECT distinct col1, col2... colNFROM yourTableWHERE 1; 删除表yourTable;RENAME TABLE tmpTableTO yourTable;

Please note, insert into statement may execute without primary key.

请注意,insert into 语句可以在没有主键的情况下执行。

Thanks.

谢谢。