SQL 如何使用SQL查询从表中删除重复项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7676110/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove duplicates from table using SQL query
提问by user7
I have a table which is as follows:
我有一张表,如下所示:
emp_name emp_address sex matial_status
uuuu eee m s
iiii iii f s
uuuu eee m s
I want to remove the duplicate entries based on 3 fields emp_name, emp_address and sex. and my resultant table (after removing the duplicates) should look like -
我想删除基于 3 个字段 emp_name、emp_address 和 sex 的重复条目。我的结果表(删除重复项后)应该看起来像 -
emp_name emp_address sex marital_status
uuuu eee m s
iiii iii f s
I am not able to recall how to write a SQL Query for this. an anyone pls help?
我不记得如何为此编写 SQL 查询。有人请帮忙吗?
采纳答案by Roopesh Shenoy
It looks like all four column values are duplicated so you can do this -
看起来所有四列值都是重复的,因此您可以执行此操作 -
select distinct emp_name, emp_address, sex, marital_status
from YourTable
However if marital status can be different and you have some other column based on which to choose (for eg you want latest record based on a column create_date) you can do this
但是,如果婚姻状况可能不同,并且您有其他一些列可供选择(例如,您想要基于列 create_date 的最新记录),您可以执行此操作
select emp_name, emp_address, sex, marital_status
from YourTable a
where not exists (select 1
from YourTable b
where b.emp_name = a.emp_name and
b.emp_address = a.emp_address and
b.sex = a.sex and
b.create_date >= a.create_date)
回答by Kusalananda
I would create a new table with a unique index over the columns that you want to keep unique. Then do an insert from the old table into the new, ignoring the warnings about duplicated rows. Lastly, I would drop (or rename) the old table and replace it with the new table. In MySQL, this would look like
我会在要保持唯一的列上创建一个具有唯一索引的新表。然后从旧表插入新表,忽略有关重复行的警告。最后,我会删除(或重命名)旧表并将其替换为新表。在 MySQL 中,这看起来像
CREATE TABLE tmp LIKE mytable;
ALTER TABLE tmp ADD UNIQUE INDEX myindex (emp_name, emp_address, sex, marital_status);
INSERT IGNORE INTO tmp SELECT * FROM mytable;
DROP TABLE mytable;
RENAME TABLE tmp TO mytable;
Or something similar (this is totally untested).
或类似的东西(这是完全未经测试的)。
回答by Mikael Eriksson
This is not a query but a delete statement. It will delete/remove duplicate rows from your table
这不是查询而是删除语句。它将从您的表中删除/删除重复的行
;with C as
(
select row_number() over(partition by DUPLICATE_VAARS_DECISION
order by NODE_EQ_NO) as rn
from yourtable
)
delete C
where rn > 1
If you are only interested in querying the table and get the non duplicates as a result you should use this instead.
如果您只对查询表感兴趣并因此获得非重复项,则应改用它。
;with C as
(
select *,
row_number() over(partition by DUPLICATE_VAARS_DECISION
order by NODE_EQ_NO) as rn
from yourtable
)
select *
from C
where rn = 1
回答by SQLMenace
one way
单程
select emp_name, emp_address, sex, max(marital_status) as marital_status
from Yourtable
group by emp_name, emp_address, sex
Since I don't know what you want, I used max for the marital status
由于我不知道你想要什么,我用max作为婚姻状况
See also Including an Aggregated Column's Related Valuesfor more examples
有关更多示例,另请参阅包括聚合列的相关值
回答by Zasz
If you are okay with trading space for performance and simplicity then the duplicates in emp_name | emp_address | sex
combo can be eliminated, by the introduction of a calculated/derived column using CHECKSUM()
TSQL method and DISTINCT
keyword while querying.
如果您对性能和简单性的交易空间没有意见,那么emp_name | emp_address | sex
可以通过在查询时使用CHECKSUM()
TSQL 方法和DISTINCT
关键字引入计算/派生列来消除组合中的重复项。
Heres an example of CHECKSUM :
这是 CHECKSUM 的示例:
SELECT CHECKSUM(*) FROM HumanResources.Employee WHERE EmployeeID = 2
Google around and create a dependent column that contains the checksum of the 3 columns. Then you can select distinct rows by looking at this question
谷歌并创建一个包含 3 列校验和的依赖列。然后您可以通过查看此问题来选择不同的行
回答by Tank Liu
The best answer is here:
Use this SQL statement to identify the extra duplicated rows:
最佳答案在这里:
使用此 SQL 语句来识别额外的重复行:
select * from Employee a
where %%physloc%% >
(select min(%%physloc%%) from Employee b
where a.emp_name=b.emp_name and a.emp_address=b.emp_address and a.sex=b.sex);
you will get the extra row:
你会得到额外的一行:
uuuu eee m s
Use this SQL statement to delete the extra duplicated rows:
使用此 SQL 语句删除多余的重复行:
delete from Employee a
where %%physloc%% >
(select min(%%physloc%%) from Employee b
where a.emp_name=b.emp_name and a.emp_address=b.emp_address and a.sex=b.sex);
For all duplicated records, only the one with lowest physical location is kept. This method can be applied to remove all kinds of duplicated rows.
对于所有重复的记录,只保留物理位置最低的记录。此方法可用于删除各种重复行。
I am assuming that you use MS SQL Server. If you are using Oracle DB, then you can just replace '%%physloc%%' with 'rowid'
我假设您使用 MS SQL Server。如果您使用的是 Oracle DB,那么您可以将“ %%physloc%%”替换为“ rowid”
Enjoy the code!
享受代码!
回答by Shahadat Hossain Khan
I know this is old post, but recently I tested a solution and want to share if any one can find my solution helpful -
我知道这是旧帖子,但最近我测试了一个解决方案,如果有人觉得我的解决方案有帮助,我想分享一下 -
CREATE TABLE
tmpTable
LIKEyourTable
; insert intotmpTable
(col1
,col2
...colN
) SELECT distinctcol1
,col2
...colN
FROMyourTable
WHERE 1; drop tableyourTable
; RENAME TABLEtmpTable
TOyourTable
;
创建表
tmpTable
喜欢yourTable
; 插入tmpTable
(col1
,col2
...colN
) SELECT distinctcol1
,col2
...colN
FROMyourTable
WHERE 1; 删除表yourTable
;RENAME TABLEtmpTable
TOyourTable
;
Please note, insert into statement may execute without primary key.
请注意,insert into 语句可以在没有主键的情况下执行。
Thanks.
谢谢。