删除除 MySQL 中的一行之外的所有重复行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4685173/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Delete all Duplicate Rows except for One in MySQL?
提问by Highway of Life
How would I delete all duplicate data from a MySQL Table?
如何从 MySQL 表中删除所有重复数据?
For example, with the following data:
例如,使用以下数据:
SELECT * FROM names;
+----+--------+
| id | name |
+----+--------+
| 1 | google |
| 2 | yahoo |
| 3 | msn |
| 4 | google |
| 5 | google |
| 6 | yahoo |
+----+--------+
I would use SELECT DISTINCT name FROM names;
if it were a SELECT
query.
SELECT DISTINCT name FROM names;
如果它是一个SELECT
查询,我会使用。
How would I do this with DELETE
to only remove duplicates and keep just one record of each?
我将如何执行此操作DELETE
以仅删除重复项并仅保留每个记录?
回答by martin.masa
Editor warning: This solution is computationally inefficient and may bring down your connection for a large table.
编辑器警告:此解决方案在计算上效率低下,可能会导致大表的连接中断。
NB - You needto do this first on a test copyof your table!
注意 - 您需要先在表的测试副本上执行此操作!
When I did it, I found that unless I also included AND n1.id <> n2.id
, it deleted every row in the table.
当我这样做时,我发现除非我还包含AND n1.id <> n2.id
,否则它会删除表中的每一行。
If you want to keep the row with the lowest
id
value:DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
If you want to keep the row with the highest
id
value:DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
如果要保留具有最低
id
值的行:DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
如果要保留具有最高
id
值的行:DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
I used this method in MySQL 5.1
我在 MySQL 5.1 中使用了这种方法
Not sure about other versions.
不确定其他版本。
Update: Since people Googling for removing duplicates end up here
Although the OP's question is about DELETE
, please be advised that using INSERT
and DISTINCT
is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE
, it took more than 2 hours and yet didn't complete.
更新:由于人们在谷歌搜索中删除重复项最终在这里
虽然 OP 的问题是关于DELETE
,但请注意使用INSERT
和DISTINCT
速度要快得多。对于一个有 800 万行的数据库,下面的查询用了 13 分钟,而使用 时DELETE
,用了 2 个多小时,但没有完成。
INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)
SELECT DISTINCT cellId,attributeId,entityRowId,value
FROM tableName;
回答by OMG Ponies
If you want to keep the row with the lowest id
value:
如果要保留具有最低id
值的行:
DELETE FROM NAMES
WHERE id NOT IN (SELECT *
FROM (SELECT MIN(n.id)
FROM NAMES n
GROUP BY n.name) x)
If you want the id
value that is the highest:
如果您想要id
最高的值:
DELETE FROM NAMES
WHERE id NOT IN (SELECT *
FROM (SELECT MAX(n.id)
FROM NAMES n
GROUP BY n.name) x)
The subquery in a subquery is necessary for MySQL, or you'll get a 1093 error.
MySQL 需要子查询中的子查询,否则会出现 1093 错误。