MySQL MySQL删除重复记录但保持最新
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6107167/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MySQL delete duplicate records but keep latest
提问by Khuram
I have unique id
and email
fields. Emails get duplicated. I only want to keep one Email address of all the duplicates but with the latest id
(the last inserted record).
我有独特的id
和email
领域。电子邮件被复制。我只想保留所有重复项的一个电子邮件地址,但要保留最新的id
(最后插入的记录)。
How can I achieve this?
我怎样才能做到这一点?
回答by Jose Rui Santos
Imagine your table test
contains the following data:
假设您的表test
包含以下数据:
select id, email
from test;
ID EMAIL
---------------------- --------------------
1 aaa
2 bbb
3 ccc
4 bbb
5 ddd
6 eee
7 aaa
8 aaa
9 eee
So, we need to find all repeated emails and delete all of them, but the latest id.
In this case, aaa
, bbb
and eee
are repeated, so we want to delete IDs 1, 7, 2 and 6.
因此,我们需要找到所有重复的电子邮件并删除所有邮件,但要删除最新的 id。
在这种情况下,aaa
,bbb
和eee
重复,所以我们要删除ID为1,7,2和6。
To accomplish this, first we need to find all the repeated emails:
为此,首先我们需要找到所有重复的电子邮件:
select email
from test
group by email
having count(*) > 1;
EMAIL
--------------------
aaa
bbb
eee
Then, from this dataset, we need to find the latest id for each one of these repeated emails:
然后,从这个数据集中,我们需要为这些重复的电子邮件中的每一个找到最新的 id:
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email;
LASTID EMAIL
---------------------- --------------------
8 aaa
4 bbb
9 eee
Finally we can now delete all of these emails with an Id smaller than LASTID. So the solution is:
最后,我们现在可以删除所有这些 Id 小于 LASTID 的电子邮件。所以解决办法是:
delete test
from test
inner join (
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email
) duplic on duplic.email = test.email
where test.id < duplic.lastId;
I don't have mySql installed on this machine right now, but should work
我现在这台机器上没有安装 mySql,但应该可以工作
Update
更新
The above delete works, but I found a more optimized version:
上面的删除有效,但我发现了一个更优化的版本:
delete test
from test
inner join (
select max(id) as lastId, email
from test
group by email
having count(*) > 1) duplic on duplic.email = test.email
where test.id < duplic.lastId;
You can see that it deletes the oldest duplicates, i.e. 1, 7, 2, 6:
可以看到它删除了最旧的重复项,即 1、7、2、6:
select * from test;
+----+-------+
| id | email |
+----+-------+
| 3 | ccc |
| 4 | bbb |
| 5 | ddd |
| 8 | aaa |
| 9 | eee |
+----+-------+
Another version, is the delete provived by Rene Limon
另一个版本,是由Rene Limon证明的删除
delete from test
where id not in (
select max(id)
from test
group by email)
回答by Gaurav Kandpal
Correct way is
正确的做法是
DELETE FROM `tablename`
WHERE id NOT IN (
SELECT * FROM (
SELECT MAX(id) FROM tablename
GROUP BY name
)
)
回答by Pulkit Malhotra
Try this method
试试这个方法
DELETE t1 FROM test t1, test t2
WHERE t1.id > t2.id AND t1.email = t2.email
回答by Jeff Fol
I personally had trouble with the top two voted answers. It's not the cleanest solution but you can utilize temporary tables to avoid all the issues MySQL has with deleting via joining on the same table.
我个人对前两个投票的答案有困难。这不是最干净的解决方案,但您可以利用临时表来避免 MySQL 通过加入同一个表进行删除的所有问题。
CREATE TEMPORARY TABLE deleteRows;
SELECT MIN(id) as id FROM myTable GROUP BY myTable.email;
DELETE FROM myTable
WHERE id NOT IN (SELECT id FROM deleteRows);
回答by Michael Sheaver
I must say that the optimized version is one sweet, elegant piece of code, and it works like a charm even when the comparison is performed on a DATETIME column. This is what I used in my script, where I was searching for the latest contract end date for each EmployeeID:
我必须说优化的版本是一段甜蜜、优雅的代码,即使在 DATETIME 列上执行比较时,它也能像魅力一样工作。这是我在脚本中使用的,我在其中搜索每个 EmployeeID 的最新合同结束日期:
DELETE CurrentContractData
FROM CurrentContractData
INNER JOIN (
SELECT
EmployeeID,
PeriodofPerformanceStartDate,
max(PeriodofPerformanceEndDate) as lastDate,
ContractID
FROM CurrentContractData
GROUP BY EmployeeID
HAVING COUNT(*) > 1) Duplicate on Duplicate.EmployeeID = CurrentContractData.EmployeeID
WHERE CurrentContractData.PeriodofPerformanceEndDate < Duplicate.lastDate;
Many thanks!
非常感谢!