在 MySQL 中查找重复记录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/854128/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 13:15:52  来源:igfitidea点击:

Find duplicate records in MySQL

mysqlduplicates

提问by Chris Bartow

I want to pull out duplicate records in a MySQL Database. This can be done with:

我想提取 MySQL 数据库中的重复记录。这可以通过以下方式完成:

SELECT address, count(id) as cnt FROM list
GROUP BY address HAVING cnt > 1

Which results in:

结果是:

100 MAIN ST    2

I would like to pull it so that it shows each row that is a duplicate. Something like:

我想拉它,以便它显示重复的每一行。就像是:

JIM    JONES    100 MAIN ST
JOHN   SMITH    100 MAIN ST

Any thoughts on how this can be done? I'm trying to avoid doing the first one then looking up the duplicates with a second query in the code.

关于如何做到这一点的任何想法?我试图避免做第一个然后在代码中使用第二个查询查找重复项。

回答by Powerlord

The key is to rewrite this query so that it can be used as a subquery.

关键是重写这个查询,以便它可以用作子查询。

SELECT firstname, 
   lastname, 
   list.address 
FROM list
   INNER JOIN (SELECT address
               FROM   list
               GROUP  BY address
               HAVING COUNT(id) > 1) dup
           ON list.address = dup.address;

回答by trt

SELECT date FROM logs group by date having count(*) >= 2

回答by rudolfson

Why not just INNER JOIN the table with itself?

为什么不只是 INNER JOIN 表本身?

SELECT a.firstname, a.lastname, a.address
FROM list a
INNER JOIN list b ON a.address = b.address
WHERE a.id <> b.id

A DISTINCT is needed if the address could exist more than two times.

如果地址可以存在两次以上,则需要 DISTINCT。

回答by Arman Malik

I tried the best answer chosen for this question, but it confused me somewhat. I actually needed that just on a single field from my table. The following example from this linkworked out very well for me:

我尝试了为这个问题选择的最佳答案,但它让我有些困惑。我实际上只需要在我的桌子上的一个字段上使用它。此链接中的以下示例对我来说效果很好:

SELECT COUNT(*) c,title FROM `data` GROUP BY title HAVING c > 1;

回答by pratswinz

select `cityname` from `codcities` group by `cityname` having count(*)>=2

This is the similar query you have asked for and its 200% working and easy too. Enjoy!!!

这是您要求的类似查询,它也 200% 工作且简单。享受!!!

回答by Tudor

Isn't this easier :

这不是更容易吗:

SELECT *
FROM tc_tariff_groups
GROUP BY group_id
HAVING COUNT(group_id) >1

?

?

回答by doublejosh

Find duplicate users by email addresswith this query...

使用此查询通过电子邮件地址查找重复用户...

SELECT users.name, users.uid, users.mail, from_unixtime(created)
FROM users
INNER JOIN (
  SELECT mail
  FROM users
  GROUP BY mail
  HAVING count(mail) > 1
) dupes ON users.mail = dupes.mail
ORDER BY users.mail;

回答by KESAVAN PURUSOTHAMAN

we can found the duplicates depends on more then one fields also.For those cases you can use below format.

我们可以发现重复项也取决于多个字段。对于这些情况,您可以使用以下格式。

SELECT COUNT(*), column1, column2 
FROM tablename
GROUP BY column1, column2
HAVING COUNT(*)>1;

回答by Matt

Finding duplicate addressesis much more complex than it seems, especially if you require accuracy. A MySQL query is not enough in this case...

查找重复地址比看起来要复杂得多,尤其是在您需要准确性的情况下。在这种情况下,MySQL 查询是不够的......

I work at SmartyStreets, where we do address validation and de-duplication and other stuff, and I've seen a lot of diverse challenges with similar problems.

我在SmartyStreets工作,我们在那里解决验证和重复数据删除以及其他问题,我看到了许多具有类似问题的不同挑战。

There are several third-party services which will flag duplicates in a list for you. Doing this solely with a MySQL subquery will not account for differences in address formats and standards. The USPS (for US address) has certain guidelines to make these standard, but only a handful of vendors are certified to perform such operations.

有几种第三方服务可以为您标记列表中的重复项。仅使用 MySQL 子查询执行此操作不会考虑地址格式和标准的差异。USPS(用于美国地址)有制定这些标准的某些指导方针,但只有少数供应商获得了执行此类操作的认证。

So, I would recommend the best answer for you is to export the table into a CSV file, for instance, and submit it to a capable list processor. One such is SmartyStreets Bulk Address Validation Toolwhich will have it done for you in a few seconds to a few minutes automatically. It will flag duplicate rows with a new field called "Duplicate" and a value of Yin it.

因此,我建议您最好的答案是将表格导出为 CSV 文件,例如,并将其提交给功能强大的列表处理器。其中一个是 SmartyStreets批量地址验证工具,它将在几秒钟到几分钟内自动为您完成。它将使用名为“Duplicate”的新字段和其中的值标记重复行Y

回答by jerdiggity

Another solution would be to use table aliases, like so:

另一种解决方案是使用表别名,如下所示:

SELECT p1.id, p2.id, p1.address
FROM list AS p1, list AS p2
WHERE p1.address = p2.address
AND p1.id != p2.id

All you're really doing in this case is taking the original listtable, creating two pretend tables -- p1and p2-- out of that, and then performing a join on the address column (line 3). The 4th line makes sure that the same record doesn't show up multiple times in your set of results ("duplicate duplicates").

所有你真的这样做在这种情况下,走的是原来的名单表,创建两个pretend表- p1p2-出这一点,那么对地址栏(3号线)的联接。第 4 行确保相同的记录不会在您的结果集中出现多次(“重复的重复项”)。