SQL 如何在多列中查找重复项？

Question

提问by NimChimpsky

So I want to do something like this sql code below:

所以我想做类似下面这个sql代码的事情：

select s.id, s.name,s.city 
from stuff s
group by s.name having count(where city and name are identical) > 1

To produce the following, (but ignore where only name or only city match, it has to be on both columns):

要产生以下内容，（但忽略只有名称或只有城市匹配的地方，它必须在两列上）：

id      name  city   
904834  jim   London  
904835  jim   London  
90145   Fred  Paris   
90132   Fred  Paris
90133   Fred  Paris

Answer 1

回答by Micha? Powaga

Duplicated idfor pairs nameand city:

id对name和重复city：

select s.id, t.* 
from [stuff] s
join (
    select name, city, count(*) as qty
    from [stuff]
    group by name, city
    having count(*) > 1
) t on s.name = t.name and s.city = t.city

Answer 2

回答by Sunnny

 SELECT name, city, count(*) as qty 
 FROM stuff 
 GROUP BY name, city HAVING count(*)> 1

Answer 3

回答by ssarabando

Something like this will do the trick. Don't know about performance, so do make some tests.

像这样的事情会成功。不知道性能，所以做一些测试。

select
  id, name, city
from
  [stuff] s
where
1 < (select count(*) from [stuff] i where i.city = s.city and i.name = s.name)

Answer 4

回答by Used_By_Already

Using count(*) over(partition by...)provides a simple and efficient means to locate unwanted repetition, whilst also list all affected rows and all wanted columns:

Usingcount(*) over(partition by...)提供了一种简单有效的方法来定位不需要的重复，同时还列出所有受影响的行和所有想要的列：

SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city

While most recent RDBMS versions support count(*) over(partition by...)MySQL V 8.0introduced "window functions", as seen below (in MySQL 8.0)

虽然最新的 RDBMS 版本支持count(*) over(partition by...)MySQL V 8.0引入了“窗口函数”，如下所示（在 MySQL 8.0 中）

CREATE TABLE stuff(
   id   INTEGER  NOT NULL
  ,name VARCHAR(60) NOT NULL
  ,city VARCHAR(60) NOT NULL
);

CREATE TABLE stuff(
   id   INTEGER  NOT NULL
  ,name VARCHAR(60) NOT NULL
  ,city VARCHAR(60) NOT NULL
);

INSERT INTO stuff(id,name,city) VALUES 
  (904834,'jim','London')
, (904835,'jim','London')
, (90145,'Fred','Paris')
, (90132,'Fred','Paris')
, (90133,'Fred','Paris')

, (923457,'Barney','New York') # not expected in result
;

INSERT INTO stuff(id,name,city) VALUES 
  (904834,'jim','London')
, (904835,'jim','London')
, (90145,'Fred','Paris')
, (90132,'Fred','Paris')
, (90133,'Fred','Paris')

, (923457,'Barney','New York') # not expected in result
;

SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city

    id | name | city   | qty
-----: | :--- | :----- | --:
 90145 | Fred | Paris  |   3
 90132 | Fred | Paris  |   3
 90133 | Fred | Paris  |   3
904834 | jim  | London |   2
904835 | jim  | London |   2

SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city

    id | name | city   | qty
-----: | :--- | :----- | --:
 90145 | Fred | Paris  |   3
 90132 | Fred | Paris  |   3
 90133 | Fred | Paris  |   3
904834 | jim  | London |   2
904835 | jim  | London |   2

db<>fiddle here

db<>在这里摆弄

Window functions.MySQL now supports window functions that, for each row from a query, perform a calculation using rows related to that row. These include functions such as RANK(), LAG(), and NTILE(). In addition, several existing aggregate functions now can be used as window functions; for example, SUM() and AVG(). For more information, see Section 12.21, “Window Functions”.

窗口函数。MySQL 现在支持窗口函数，对于查询中的每一行，使用与该行相关的行执行计算。其中包括 RANK()、LAG() 和 NTILE() 等函数。此外，几个现有的聚合函数现在可以用作窗口函数；例如，SUM() 和 AVG()。有关更多信息，请参阅第 12.21 节，“窗口函数”。

Answer 5

回答by MattD

A little late to the game on this post, but I found this way to be pretty flexible / efficient

这篇文章的游戏有点晚了，但我发现这种方式非常灵活/高效

select 
    s1.id
    ,s1.name
    ,s1.city 
from 
    stuff s1
    ,stuff s2
Where
    s1.id <> s2.id
    and s1.name = s2.name
    and s1.city = s2.city

Answer 6

回答by Anja

You have to self join stuff and match name and city. Then group by count.

你必须自己加入东西并匹配名称和城市。然后按计数分组。

select 
   s.id, s.name, s.city 
from stuff s join stuff p ON (
   s.name = p.city OR s.city = p.name
)
group by s.name having count(s.name) > 1

Answer 7

回答by Don G.

Given a staging table with 70 columns and only 4 representing duplicates, this code will return the offending columns:

给定一个包含 70 列且只有 4 列表示重复项的临时表，此代码将返回有问题的列：

SELECT 
    COUNT(*)
    ,LTRIM(RTRIM(S.TransactionDate)) 
    ,LTRIM(RTRIM(S.TransactionTime))
    ,LTRIM(RTRIM(S.TransactionTicketNumber)) 
    ,LTRIM(RTRIM(GrossCost)) 
FROM Staging.dbo.Stage S
GROUP BY 
    LTRIM(RTRIM(S.TransactionDate)) 
    ,LTRIM(RTRIM(S.TransactionTime))
    ,LTRIM(RTRIM(S.TransactionTicketNumber)) 
    ,LTRIM(RTRIM(GrossCost)) 
HAVING COUNT(*) > 1

.

SQL 如何在多列中查找重复项？

提问by NimChimpsky

回答by Micha? Powaga

回答by Sunnny

回答by ssarabando

回答by Used_By_Already

回答by MattD

回答by Anja

回答by Don G.

相关推荐

最近更新

标签

SQL 如何在多列中查找重复项？

提问by NimChimpsky

回答by Micha? Powaga

回答by Sunnny

回答by ssarabando

回答by Used_By_Already

回答by MattD

回答by Anja

回答by Don G.

相关推荐

SQL 如何使用一条插入语句将多行插入到 oracle 数据库中？

SQL 在sql server中获取包括列值的行数

SQL HH:MM:SS 格式的 DATEDIFF

SQL 如何避免在连接两个表时重复

相关推荐

最近更新

标签