MySQL MySQL删除重复记录但保持最新

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6107167/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 19:59:08  来源:igfitidea点击:

MySQL delete duplicate records but keep latest

mysqlduplicates

提问by Khuram

I have unique idand emailfields. Emails get duplicated. I only want to keep one Email address of all the duplicates but with the latest id(the last inserted record).

我有独特的idemail领域。电子邮件被复制。我只想保留所有重复项的一个电子邮件地址,但要保留最新的id(最后插入的记录)。

How can I achieve this?

我怎样才能做到这一点?

回答by Jose Rui Santos

Imagine your table testcontains the following data:

假设您的表test包含以下数据:

  select id, email
    from test;

ID                     EMAIL                
---------------------- -------------------- 
1                      aaa                  
2                      bbb                  
3                      ccc                  
4                      bbb                  
5                      ddd                  
6                      eee                  
7                      aaa                  
8                      aaa                  
9                      eee 

So, we need to find all repeated emails and delete all of them, but the latest id.
In this case, aaa, bbband eeeare repeated, so we want to delete IDs 1, 7, 2 and 6.

因此,我们需要找到所有重复的电子邮件并删除所有邮件,但要删除最新的 id。
在这种情况下,aaabbbeee重复,所以我们要删除ID为1,7,2和6。

To accomplish this, first we need to find all the repeated emails:

为此,首先我们需要找到所有重复的电子邮件:

      select email 
        from test
       group by email
      having count(*) > 1;

EMAIL                
-------------------- 
aaa                  
bbb                  
eee  

Then, from this dataset, we need to find the latest id for each one of these repeated emails:

然后,从这个数据集中,我们需要为这些重复的电子邮件中的每一个找到最新的 id:

  select max(id) as lastId, email
    from test
   where email in (
              select email 
                from test
               group by email
              having count(*) > 1
       )
   group by email;

LASTID                 EMAIL                
---------------------- -------------------- 
8                      aaa                  
4                      bbb                  
9                      eee                                 

Finally we can now delete all of these emails with an Id smaller than LASTID. So the solution is:

最后,我们现在可以删除所有这些 Id 小于 LASTID 的电子邮件。所以解决办法是:

delete test
  from test
 inner join (
  select max(id) as lastId, email
    from test
   where email in (
              select email 
                from test
               group by email
              having count(*) > 1
       )
   group by email
) duplic on duplic.email = test.email
 where test.id < duplic.lastId;

I don't have mySql installed on this machine right now, but should work

我现在这台机器上没有安装 mySql,但应该可以工作

Update

更新

The above delete works, but I found a more optimized version:

上面的删除有效,但我发现了一个更优化的版本:

 delete test
   from test
  inner join (
     select max(id) as lastId, email
       from test
      group by email
     having count(*) > 1) duplic on duplic.email = test.email
  where test.id < duplic.lastId;

You can see that it deletes the oldest duplicates, i.e. 1, 7, 2, 6:

可以看到它删除了最旧的重复项,即 1、7、2、6:

select * from test;
+----+-------+
| id | email |
+----+-------+
|  3 | ccc   |
|  4 | bbb   |
|  5 | ddd   |
|  8 | aaa   |
|  9 | eee   |
+----+-------+

Another version, is the delete provived by Rene Limon

另一个版本,是由Rene Limon证明的删除

delete from test
 where id not in (
    select max(id)
      from test
     group by email)

回答by Gaurav Kandpal

Correct way is

正确的做法是

DELETE FROM `tablename` 
  WHERE id NOT IN (
    SELECT * FROM (
      SELECT MAX(id) FROM tablename 
        GROUP BY name
    ) 
  )

回答by Pulkit Malhotra

Try this method

试试这个方法

DELETE t1 FROM test t1, test t2 
WHERE t1.id > t2.id AND t1.email = t2.email

回答by Jeff Fol

I personally had trouble with the top two voted answers. It's not the cleanest solution but you can utilize temporary tables to avoid all the issues MySQL has with deleting via joining on the same table.

我个人对前两个投票的答案有困难。这不是最干净的解决方案,但您可以利用临时表来避免 MySQL 通过加入同一个表进行删除的所有问题。

CREATE TEMPORARY TABLE deleteRows;
SELECT MIN(id) as id FROM myTable GROUP BY myTable.email;

DELETE FROM myTable
WHERE id NOT IN (SELECT id FROM deleteRows);

回答by Michael Sheaver

I must say that the optimized version is one sweet, elegant piece of code, and it works like a charm even when the comparison is performed on a DATETIME column. This is what I used in my script, where I was searching for the latest contract end date for each EmployeeID:

我必须说优化的版本是一段甜蜜、优雅的代码,即使在 DATETIME 列上执行比较时,它也能像魅力一样工作。这是我在脚本中使用的,我在其中搜索每个 EmployeeID 的最新合同结束日期:

DELETE CurrentContractData
  FROM CurrentContractData
  INNER JOIN (
    SELECT
      EmployeeID,
      PeriodofPerformanceStartDate,
      max(PeriodofPerformanceEndDate) as lastDate,
      ContractID
    FROM CurrentContractData
    GROUP BY EmployeeID
    HAVING COUNT(*) > 1) Duplicate on Duplicate.EmployeeID = CurrentContractData.EmployeeID
    WHERE CurrentContractData.PeriodofPerformanceEndDate < Duplicate.lastDate;

Many thanks!

非常感谢!