SQL 如何在多列中查找重复项?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8149210/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I find duplicates across multiple columns?
提问by NimChimpsky
So I want to do something like this sql code below:
所以我想做类似下面这个sql代码的事情:
select s.id, s.name,s.city
from stuff s
group by s.name having count(where city and name are identical) > 1
To produce the following, (but ignore where only name or only city match, it has to be on both columns):
要产生以下内容,(但忽略只有名称或只有城市匹配的地方,它必须在两列上):
id name city
904834 jim London
904835 jim London
90145 Fred Paris
90132 Fred Paris
90133 Fred Paris
回答by Micha? Powaga
Duplicated id
for pairs name
and city
:
id
对name
和重复city
:
select s.id, t.*
from [stuff] s
join (
select name, city, count(*) as qty
from [stuff]
group by name, city
having count(*) > 1
) t on s.name = t.name and s.city = t.city
回答by Sunnny
SELECT name, city, count(*) as qty
FROM stuff
GROUP BY name, city HAVING count(*)> 1
回答by ssarabando
Something like this will do the trick. Don't know about performance, so do make some tests.
像这样的事情会成功。不知道性能,所以做一些测试。
select
id, name, city
from
[stuff] s
where
1 < (select count(*) from [stuff] i where i.city = s.city and i.name = s.name)
回答by Used_By_Already
Using count(*) over(partition by...)
provides a simple and efficient means to locate unwanted repetition, whilst also list all affected rows and all wanted columns:
Usingcount(*) over(partition by...)
提供了一种简单有效的方法来定位不需要的重复,同时还列出所有受影响的行和所有想要的列:
SELECT
t.*
FROM (
SELECT
s.*
, COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
FROM stuff s
) t
WHERE t.qty > 1
ORDER BY t.name, t.city
While most recent RDBMS versions support count(*) over(partition by...)
MySQL V 8.0introduced "window functions", as seen below (in MySQL 8.0)
虽然最新的 RDBMS 版本支持count(*) over(partition by...)
MySQL V 8.0引入了“窗口函数”,如下所示(在 MySQL 8.0 中)
CREATE TABLE stuff( id INTEGER NOT NULL ,name VARCHAR(60) NOT NULL ,city VARCHAR(60) NOT NULL );
CREATE TABLE stuff( id INTEGER NOT NULL ,name VARCHAR(60) NOT NULL ,city VARCHAR(60) NOT NULL );
INSERT INTO stuff(id,name,city) VALUES (904834,'jim','London') , (904835,'jim','London') , (90145,'Fred','Paris') , (90132,'Fred','Paris') , (90133,'Fred','Paris') , (923457,'Barney','New York') # not expected in result ;
INSERT INTO stuff(id,name,city) VALUES (904834,'jim','London') , (904835,'jim','London') , (90145,'Fred','Paris') , (90132,'Fred','Paris') , (90133,'Fred','Paris') , (923457,'Barney','New York') # not expected in result ;
SELECT t.* FROM ( SELECT s.* , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty FROM stuff s ) t WHERE t.qty > 1 ORDER BY t.name, t.city
id | name | city | qty -----: | :--- | :----- | --: 90145 | Fred | Paris | 3 90132 | Fred | Paris | 3 90133 | Fred | Paris | 3 904834 | jim | London | 2 904835 | jim | London | 2
SELECT t.* FROM ( SELECT s.* , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty FROM stuff s ) t WHERE t.qty > 1 ORDER BY t.name, t.city
id | name | city | qty -----: | :--- | :----- | --: 90145 | Fred | Paris | 3 90132 | Fred | Paris | 3 90133 | Fred | Paris | 3 904834 | jim | London | 2 904835 | jim | London | 2
db<>fiddle here
db<>在这里摆弄
Window functions.MySQL now supports window functions that, for each row from a query, perform a calculation using rows related to that row. These include functions such as RANK(), LAG(), and NTILE(). In addition, several existing aggregate functions now can be used as window functions; for example, SUM() and AVG(). For more information, see Section 12.21, “Window Functions”.
窗口函数。MySQL 现在支持窗口函数,对于查询中的每一行,使用与该行相关的行执行计算。其中包括 RANK()、LAG() 和 NTILE() 等函数。此外,几个现有的聚合函数现在可以用作窗口函数;例如,SUM() 和 AVG()。有关更多信息,请参阅第 12.21 节,“窗口函数”。
回答by MattD
A little late to the game on this post, but I found this way to be pretty flexible / efficient
这篇文章的游戏有点晚了,但我发现这种方式非常灵活/高效
select
s1.id
,s1.name
,s1.city
from
stuff s1
,stuff s2
Where
s1.id <> s2.id
and s1.name = s2.name
and s1.city = s2.city
回答by Anja
You have to self join stuff and match name and city. Then group by count.
你必须自己加入东西并匹配名称和城市。然后按计数分组。
select
s.id, s.name, s.city
from stuff s join stuff p ON (
s.name = p.city OR s.city = p.name
)
group by s.name having count(s.name) > 1
回答by Don G.
Given a staging table with 70 columns and only 4 representing duplicates, this code will return the offending columns:
给定一个包含 70 列且只有 4 列表示重复项的临时表,此代码将返回有问题的列:
SELECT
COUNT(*)
,LTRIM(RTRIM(S.TransactionDate))
,LTRIM(RTRIM(S.TransactionTime))
,LTRIM(RTRIM(S.TransactionTicketNumber))
,LTRIM(RTRIM(GrossCost))
FROM Staging.dbo.Stage S
GROUP BY
LTRIM(RTRIM(S.TransactionDate))
,LTRIM(RTRIM(S.TransactionTime))
,LTRIM(RTRIM(S.TransactionTicketNumber))
,LTRIM(RTRIM(GrossCost))
HAVING COUNT(*) > 1
.
.