MySQL 如何根据多个字段删除SQL表中的重复项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6471463/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to delete duplicates in SQL table based on multiple fields
提问by cfrederich
I have a table of games, which is described as follows:
我有一个游戏表,描述如下:
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| date | date | NO | | NULL | |
| time | time | NO | | NULL | |
| hometeam_id | int(11) | NO | MUL | NULL | |
| awayteam_id | int(11) | NO | MUL | NULL | |
| locationcity | varchar(30) | NO | | NULL | |
| locationstate | varchar(20) | NO | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
But each game has a duplicate entry in the table somewhere, because each game was in the schedules for two teams. Is there a sql statement I can use to look through and delete all the duplicates based on identical date, time, hometeam_id, awayteam_id, locationcity, and locationstate fields?
但是每场比赛在表中的某处都有重复的条目,因为每场比赛都在两支球队的赛程中。是否有一个 sql 语句可以用来查看和删除基于相同日期、时间、hometeam_id、awayteam_id、locationcity 和 locationstate 字段的所有重复项?
回答by N West
You should be able to do a correlated subquery to delete the data. Find all rows that are duplicates and delete all but the one with the smallest id. For MYSQL, an inner join (functional equivalent of EXISTS) needs to be used, like so:
您应该能够执行相关子查询来删除数据。查找所有重复的行并删除除 id 最小的行之外的所有行。对于 MYSQL,需要使用内部联接(功能等效于 EXISTS),如下所示:
delete games from games inner join
(select min(id) minid, date, time,
hometeam_id, awayteam_id, locationcity, locationstate
from games
group by date, time, hometeam_id,
awayteam_id, locationcity, locationstate
having count(1) > 1) as duplicates
on (duplicates.date = games.date
and duplicates.time = games.time
and duplicates.hometeam_id = games.hometeam_id
and duplicates.awayteam_id = games.awayteam_id
and duplicates.locationcity = games.locationcity
and duplicates.locationstate = games.locationstate
and duplicates.minid <> games.id)
To test, replace delete games from games
with select * from games
. Don't just run a delete on your DB :-)
要测试,请替换delete games from games
为select * from games
. 不要只是在您的数据库上运行删除 :-)
回答by Grigor Gevorgyan
You can try such query:
您可以尝试这样的查询:
DELETE FROM table_name AS t1
WHERE EXISTS (
SELECT 1 FROM table_name AS t2
WHERE t2.date = t1.date
AND t2.time = t1.time
AND t2.hometeam_id = t1.hometeam_id
AND t2.awayteam_id = t1.awayteam_id
AND t2.locationcity = t1.locationcity
AND t2.id > t1.id )
This will leave in database only one example of each game instance which has the smallest id.
这将在数据库中只留下一个具有最小 id 的游戏实例的示例。
回答by Ali Hashemi
The best thing that worked for me was to recreate the table.
对我有用的最好的事情是重新创建表格。
CREATE TABLE newtable SELECT * FROM oldtable GROUP BY field1,field2;
You can then rename.
然后您可以重命名。
回答by Rem
To get list of duplicate entried matching two fields
获取匹配两个字段的重复条目列表
select t.ID, t.field1, t.field2
from (
select field1, field2
from table_name
group by field1, field2
having count(*) > 1) x, table_name t
where x.field1 = t.field1 and x.field2 = t.field2
order by t.field1, t.field2
And to delete all the duplicate only
并只删除所有重复项
DELETE x
FROM table_name x
JOIN table_name y
ON y.field1= x.field1
AND y.field2 = x.field2
AND y.id < x.id;
回答by Neville Kuyt
select orig.id,
dupl.id
from games orig,
games dupl
where orig.date = dupl.date
and orig.time = dupl.time
and orig.hometeam_id = dupl.hometeam_id
and orig. awayteam_id = dupl.awayeam_id
and orig.locationcity = dupl.locationcity
and orig.locationstate = dupl.locationstate
and orig.id < dupl.id
this should give you the duplicates; you can use it as a subquery to specify IDs to delete.
这应该给你重复;您可以将其用作子查询来指定要删除的 ID。
回答by Wicked Coder
AS long as you are not getting id (primary key) of the table in your select query and the other data is exact same you can use SELECT DISTINCT
to avoid getting duplicate results.
只要您没有在选择查询中获得表的 id(主键)并且其他数据完全相同,您就可以使用它SELECT DISTINCT
来避免获得重复结果。
回答by piotrpo
delete from games
where id not in
(select max(id) from games
group by date, time, hometeam_id, awayteam_id, locationcity, locationstate
);
Workaround
解决方法
select max(id) id from games
group by date, time, hometeam_id, awayteam_id, locationcity, locationstate
into table temp_table;
delete from games where id in (select id from temp);
回答by limscoder
DELETE FROM table
WHERE id =
(SELECT t.id
FROM table as t
JOIN (table as tj ON (t.date = tj.data
AND t.hometeam_id = tj.hometeam_id
AND t.awayteam_id = tj.awayteam_id
...))