在 SQL 表中查找重复值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2594829/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:57:04  来源:igfitidea点击:

Finding duplicate values in a SQL table

sqlduplicates

提问by Alex

It's easy to find duplicates with one field:

使用一个字段很容易找到重复项:

SELECT name, COUNT(email) 
FROM users
GROUP BY email
HAVING COUNT(email) > 1

So if we have a table

所以如果我们有一张桌子

ID   NAME   EMAIL
1    John   [email protected]
2    Sam    [email protected]
3    Tom    [email protected]
4    Bob    [email protected]
5    Tom    [email protected]

This query will give us John, Sam, Tom, Tom because they all have the same email.

这个查询将给我们 John、Sam、Tom、Tom,因为他们都有相同的email.

However, what I want is to get duplicates with the same emailandname.

但是,我想要的是获得具有相同emailname.

That is, I want to get "Tom", "Tom".

也就是说,我想得到“汤姆”,“汤姆”。

The reason I need this: I made a mistake, and allowed to insert duplicate nameand emailvalues. Now I need to remove/change the duplicates, so I need to findthem first.

我需要这个的原因:我犯了一个错误,并允许插入重复项nameemail值。现在我需要删除/更改重复项,所以我需要先找到它们。

回答by gbn

SELECT
    name, email, COUNT(*)
FROM
    users
GROUP BY
    name, email
HAVING 
    COUNT(*) > 1

Simply group on both of the columns.

只需对两列进行分组。

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

注意:旧的 ANSI 标准是在 GROUP BY 中包含所有非聚合列,但这已经随着“功能依赖”的想法而改变:

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

在关系数据库理论中,函数依赖是数据库关系中两组属性之间的约束。换句话说,函数依赖是描述关系中属性之间关系的约束。

Support is not consistent:

支持不一致:

回答by KM.

try this:

尝试这个:

declare @YourTable table (id int, name varchar(10), email varchar(50))

INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')

SELECT
    name,email, COUNT(*) AS CountOf
    FROM @YourTable
    GROUP BY name,email
    HAVING COUNT(*)>1

OUTPUT:

输出:

name       email       CountOf
---------- ----------- -----------
John       John-email  2
sam        sam-email   2

(2 row(s) affected)

if you want the IDs of the dups use this:

如果你想要重复的 ID 使用这个:

SELECT
    y.id,y.name,y.email
    FROM @YourTable y
        INNER JOIN (SELECT
                        name,email, COUNT(*) AS CountOf
                        FROM @YourTable
                        GROUP BY name,email
                        HAVING COUNT(*)>1
                    ) dt ON y.name=dt.name AND y.email=dt.email

OUTPUT:

输出:

id          name       email
----------- ---------- ------------
1           John       John-email
2           John       John-email
5           sam        sam-email
6           sam        sam-email

(4 row(s) affected)

to delete the duplicates try:

删除重复项尝试:

DELETE d
    FROM @YourTable d
        INNER JOIN (SELECT
                        y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
                        FROM @YourTable y
                            INNER JOIN (SELECT
                                            name,email, COUNT(*) AS CountOf
                                            FROM @YourTable
                                            GROUP BY name,email
                                            HAVING COUNT(*)>1
                                        ) dt ON y.name=dt.name AND y.email=dt.email
                   ) dt2 ON d.id=dt2.id
        WHERE dt2.RowRank!=1
SELECT * FROM @YourTable

OUTPUT:

输出:

id          name       email
----------- ---------- --------------
1           John       John-email
3           fred       John-email
4           fred       fred-email
5           sam        sam-email

(4 row(s) affected)

回答by Chris Van Opstal

Try this:

尝试这个:

SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )

回答by AncAinu

If you want to delete the duplicates, here's a much simpler way to do it than having to find even/odd rows into a triple sub-select:

如果您想删除重复项,这是一种比在三重子选择中查找偶数/奇数行更简单的方法:

SELECT id, name, email 
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id

And so to delete:

所以要删除:

DELETE FROM users
WHERE id IN (
    SELECT id/*, name, email*/
    FROM users u, users u2
    WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
)

Much more easier to read and understand IMHO

更容易阅读和理解恕我直言

Note:The only issue is that you have to execute the request until there is no rows deleted, since you delete only 1 of each duplicate each time

注意:唯一的问题是您必须执行请求直到没有行被删除,因为您每次只删除每个重复项中的 1 个

回答by gaurav singh

Try the following:

请尝试以下操作:

SELECT * FROM
(
    SELECT Id, Name, Age, Comments, Row_Number() OVER(PARTITION BY Name, Age ORDER By Name)
        AS Rank 
        FROM Customers
) AS B WHERE Rank>1

回答by PRADEEPTA VIRLLEY

 SELECT name, email 
    FROM users
    WHERE email in
    (SELECT email FROM users
    GROUP BY email 
    HAVING COUNT(*)>1)

回答by Indivision Dev

A little late to the party but I found a really cool workaround to finding all duplicate IDs:

聚会有点晚了,但我找到了一个非常酷的解决方法来查找所有重复的 ID:

SELECT GROUP_CONCAT( id )
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )

回答by Tanmay Nehete

try this code

试试这个代码

WITH CTE AS

( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE 

回答by Martin Silovsky

This selects/deletes all duplicate records except one record from each group of duplicates. So, the delete leaves all unique records + one record from each group of the duplicates.

这将从每组重复项中选择/删除除一个记录之外的所有重复记录。因此,删除会留下所有唯一记录 + 每组重复项中的一条记录。

Select duplicates:

选择重复项:

SELECT *
FROM table
WHERE
    id NOT IN (
        SELECT MIN(id)
        FROM table
        GROUP BY column1, column2
);

Delete duplicates:

删除重复项:

DELETE FROM table
WHERE
    id NOT IN (
        SELECT MIN(id)
        FROM table
        GROUP BY column1, column2
);

Be aware of larger amounts of records, it can cause performance problems.

请注意大量记录,它可能会导致性能问题。

回答by xDBA

In case you work with Oracle, this way would be preferable:

如果您使用 Oracle,这种方式会更可取:

create table my_users(id number, name varchar2(100), email varchar2(100));

insert into my_users values (1, 'John', '[email protected]');
insert into my_users values (2, 'Sam', '[email protected]');
insert into my_users values (3, 'Tom', '[email protected]');
insert into my_users values (4, 'Bob', '[email protected]');
insert into my_users values (5, 'Tom', '[email protected]');

commit;

select *
  from my_users
 where rowid not in (select min(rowid) from my_users group by name, email);