MySQL 按两列查找并删除重复行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14340820/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 16:11:51  来源:igfitidea点击:

Find and remove duplicate rows by two columns

mysqlsqlindexingduplicate-removal

提问by user838437

I read all the relevant duplicated questions/answers and I found this to be the most relevant answer:

我阅读了所有相关的重复问题/答案,我发现这是最相关的答案:

INSERT IGNORE INTO temp(MAILING_ID,REPORT_ID) 
SELECT DISTINCT MAILING_ID,REPORT_IDFROM table_1
;

The problem is that I want to remove duplicates by col1 and col2, but also want to include to the insert all the other fields of table_1.

问题是我想通过 col1 和 col2 删除重复项,但还想插入 table_1 的所有其他字段。

I tried to add all the relevant columns this way:

我尝试以这种方式添加所有相关列:

INSERT IGNORE INTO temp(M_ID,MAILING_ID,REPORT_ID,
MAILING_NAME,VISIBILITY,EXPORTED) SELECT DISTINCT  
M_ID,MAILING_ID,REPORT_ID,MAILING_NAME,VISIBILITY,
EXPORTED FROM table_1
;


M_ID(int,primary),MAILING_ID(int),REPORT_ID(int),
MAILING_NAME(varchar),VISIBILITY(varchar),EXPORTED(int)

But it inserted all rows into temp (including duplicates)

但它将所有行插入到 temp (包括重复)

回答by guidod

The best way to delete duplicate rows by multiple columns is the simplest one:

按多列删除重复行的最佳方法是最简单的方法:

Add an UNIQUE index:

添加唯一索引:

ALTER IGNORE TABLE your_table ADD UNIQUE (field1,field2,field3);

The IGNORE above makes sure that only the first found row is kept, the rest discarded.

上面的 IGNORE 确保只保留第一个找到的行,其余的被丢弃。

(You can then drop that index if you need future duplicates and/or know they won't happen again).

(如果您将来需要重复和/或知道它们不会再次发生,您可以删除该索引)。

回答by LStarky

This works perfectly in any version of MySQL including 5.7+. It also handles the error You can't specify target table 'my_table' for update in FROM clauseby using a double-nested subquery. It only deletes ONE duplicate row (the later one) so if you have 3 or more duplicates, you can run the query multiple times. It neverdeletes unique rows.

这适用于任何版本的 MySQL,包括 5.7+。它还You can't specify target table 'my_table' for update in FROM clause通过使用双嵌套子查询来处理错误。它只删除一个重复行(后一个),因此如果您有 3 个或更多重复项,您可以多次运行查询。它永远不会删除唯一的行。

DELETE FROM my_table
WHERE id IN (
  SELECT calc_id FROM (
    SELECT MAX(id) AS calc_id
    FROM my_table
    GROUP BY identField1, identField2
    HAVING COUNT(id) > 1
  ) temp
)

I needed this query because I wanted to add a UNIQUE index on two columns but there were some duplicate rows that I needed to discard first.

我需要这个查询是因为我想在两列上添加一个 UNIQUE 索引,但是我需要先丢弃一些重复的行。

回答by Scotch

You will first need to find your duplicates by grouping on the two fields with a having clause.

您首先需要通过使用 have 子句对两个字段进行分组来找到重复项。

    Select identField1, identField2, count(*) FROM yourTable
        GROUP BY identField1, identField2
          HAVING count(*) >1

If this returns what you want, you can then use it as a subquery and

如果这返回您想要的内容,则可以将其用作子查询并

  DELETE FROM yourTable WHERE field in (Select identField1, identField2, count(*) FROM yourTable
        GROUP BY identField1, identField2
          HAVING count(*) >1 )

回答by Shashikant Sharma

For Mysql:

对于 MySQL:

DELETE t1 FROM yourtable t1 
  INNER JOIN yourtable t2 WHERE t1.id < t2.id 
    AND t1.identField1 = t2.identField1 
    AND t1.identField2 = t2.identField2;

回答by Bahadir Tasdemir

NOTE: This solution is an alternative & old school solution.

注意:此解决方案是一种替代的老式解决方案。



If you couldn't achieve what you wanted, then you can try my "oldschool" method:

如果您无法实现您想要的,那么您可以尝试我的“老派”方法:

First, run this query to get the duplicate records:

首先,运行此查询以获取重复记录:

select   column1,
         column2,
         count(*)
from     table
group by column1,
         column2
having   count(*) > 1
order by count(*) desc

After that, select those results and paste them into the notepad++:

之后,选择这些结果并将它们粘贴到记事本++中:

select querypaste onto notepad

选择查询粘贴到记事本上

Now by using the find and replace specialty of the notepad++ replace them with; first "delete" then "insert" queries like this (from now on, for security reasons, my values will be AAAA).

现在通过使用记事本++的查找和替换特性替换它们;首先“删除”然后“插入”这样的查询(从现在开始,出于安全原因,我的值将是 AAAA)。

Special Note: Please make another new line for the end of the last line of your data inside notepad++ because regex matched the '\r\n' at the end of the each line:

特别注意:请在 notepad++ 中为最后一行数据的末尾再创建一行,因为正则表达式与每行末尾的 '\r\n' 匹配:

enter image description here

在此处输入图片说明

Find what regex: \D*(\d+)\D*(\d+)\D*\r\n

找到什么正则表达式: \D*(\d+)\D*(\d+)\D*\r\n

Replace with string: delete from table where column1 = $1 and column2 = $2; insert into table set column1 = $1, column2 = $2;\r\n

替换为字符串: delete from table where column1 = $1 and column2 = $2; insert into table set column1 = $1, column2 = $2;\r\n

Now finally, paste those queries to your MySQL Workbench's query console and execute. You will see only one occurrences of each duplicate record.

最后,将这些查询粘贴到 MySQL Workbench 的查询控制台并执行。您只会看到每个重复记录出现一次。

enter image description here

在此处输入图片说明

This answer is for a relation table constructed of just two columns without ID. I think you can apply it to your situation.

此答案适用于仅由没有 ID 的两列构成的关系表。我认为您可以将其应用于您的情况。

回答by Sudhanshu Jain

you can always get the primary ids by grouping that two unique fields

您始终可以通过将两个唯一字段分组来获取主 ID

select count(*), id as count from table group by col a, col b having count(*)>1;

select count(*), id as count from table group by col a, col b having count(*)>1;

and then

进而

delete from table where id in ( select count(*), id as count from table group by col a, col b having count(*)>1) limit maxlimit;

delete from table where id in ( select count(*), id as count from table group by col a, col b having count(*)>1) limit maxlimit;

you can also use max()in place of limit

你也可以max()代替limit

回答by Govind

In a large data set if you are selecting the multiple columns in the select clause ex: select x,y,z from table1. And the requirement is to remove duplicate based on two columns:from above example let y,z then you may use below instead of using combo of "group by" and "sub query", which is bad in performance:

在大型数据集中,如果您在 select 子句中选择多个列,例如: select x,y,z from table1。并且要求是基于两列删除重复项:从上面的例子中让 y,z 然后你可以在下面使用而不是使用“group by”和“sub query”的组合,这在性能上很糟糕:

select x,y,z 
from (
select x,y,z , row_number() over (partition by y,z) as index_num
from table1) main
where main.index_num=1