删除除 MySQL 中的一行之外的所有重复行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4685173/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 18:18:18  来源:igfitidea点击:

Delete all Duplicate Rows except for One in MySQL?

mysqlsqlduplicates

提问by Highway of Life

How would I delete all duplicate data from a MySQL Table?

如何从 MySQL 表中删除所有重复数据?

For example, with the following data:

例如,使用以下数据:

SELECT * FROM names;

+----+--------+
| id | name   |
+----+--------+
| 1  | google |
| 2  | yahoo  |
| 3  | msn    |
| 4  | google |
| 5  | google |
| 6  | yahoo  |
+----+--------+

I would use SELECT DISTINCT name FROM names;if it were a SELECTquery.

SELECT DISTINCT name FROM names;如果它是一个SELECT查询,我会使用。

How would I do this with DELETEto only remove duplicates and keep just one record of each?

我将如何执行此操作DELETE以仅删除重复项并仅保留每个记录?

回答by martin.masa

Editor warning: This solution is computationally inefficient and may bring down your connection for a large table.

编辑器警告:此解决方案在计算上效率低下,可能会导致大表的连接中断。

NB - You needto do this first on a test copyof your table!

注意 - 您需要先在表的测试副本上执行此操作!

When I did it, I found that unless I also included AND n1.id <> n2.id, it deleted every row in the table.

当我这样做时,我发现除非我还包含AND n1.id <> n2.id,否则它会删除表中的每一行。

  1. If you want to keep the row with the lowest idvalue:

    DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
    
  2. If you want to keep the row with the highest idvalue:

    DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
    
  1. 如果要保留具有最低id值的行:

    DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
    
  2. 如果要保留具有最高id值的行:

    DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
    

I used this method in MySQL 5.1

我在 MySQL 5.1 中使用了这种方法

Not sure about other versions.

不确定其他版本。



Update: Since people Googling for removing duplicates end up here
Although the OP's question is about DELETE, please be advised that using INSERTand DISTINCTis much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE, it took more than 2 hours and yet didn't complete.

更新:由于人们在谷歌搜索中删除重复项最终在这里
虽然 OP 的问题是关于DELETE,但请注意使用INSERTDISTINCT速度要快得多。对于一个有 800 万行的数据库,下面的查询用了 13 分钟,而使用 时DELETE,用了 2 个多小时,但没有完成。

INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)
    SELECT DISTINCT cellId,attributeId,entityRowId,value
    FROM tableName;

回答by OMG Ponies

If you want to keep the row with the lowest idvalue:

如果要保留具有最低id值的行:

DELETE FROM NAMES
 WHERE id NOT IN (SELECT * 
                    FROM (SELECT MIN(n.id)
                            FROM NAMES n
                        GROUP BY n.name) x)

If you want the idvalue that is the highest:

如果您想要id最高的值:

DELETE FROM NAMES
 WHERE id NOT IN (SELECT * 
                    FROM (SELECT MAX(n.id)
                            FROM NAMES n
                        GROUP BY n.name) x)

The subquery in a subquery is necessary for MySQL, or you'll get a 1093 error.

MySQL 需要子查询中的子查询,否则会出现 1093 错误。