在 SQL 表中查找重复值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2594829/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Finding duplicate values in a SQL table
提问by Alex
It's easy to find duplicates with one field:
使用一个字段很容易找到重复项:
SELECT name, COUNT(email)
FROM users
GROUP BY email
HAVING COUNT(email) > 1
So if we have a table
所以如果我们有一张桌子
ID NAME EMAIL
1 John [email protected]
2 Sam [email protected]
3 Tom [email protected]
4 Bob [email protected]
5 Tom [email protected]
This query will give us John, Sam, Tom, Tom because they all have the same email
.
这个查询将给我们 John、Sam、Tom、Tom,因为他们都有相同的email
.
However, what I want is to get duplicates with the same email
andname
.
但是,我想要的是获得具有相同email
和name
.
That is, I want to get "Tom", "Tom".
也就是说,我想得到“汤姆”,“汤姆”。
The reason I need this: I made a mistake, and allowed to insert duplicate name
and email
values. Now I need to remove/change the duplicates, so I need to findthem first.
我需要这个的原因:我犯了一个错误,并允许插入重复项name
和email
值。现在我需要删除/更改重复项,所以我需要先找到它们。
回答by gbn
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
Simply group on both of the columns.
只需对两列进行分组。
Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":
注意:旧的 ANSI 标准是在 GROUP BY 中包含所有非聚合列,但这已经随着“功能依赖”的想法而改变:
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
在关系数据库理论中,函数依赖是数据库关系中两组属性之间的约束。换句话说,函数依赖是描述关系中属性之间关系的约束。
Support is not consistent:
支持不一致:
- Recent PostgreSQL supports it.
- SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
- MySQL is unpredictable and you need
sql_mode=only_full_group_by
:- GROUP BY lname ORDER BY showing wrong results;
- Which is the least expensive aggregate function in the absence of ANY()(see comments in accepted answer).
- Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
- 最近的 PostgreSQL支持它。
- SQL Server(在 SQL Server 2017 中)仍然需要 GROUP BY 中的所有非聚合列。
- MySQL 是不可预测的,您需要
sql_mode=only_full_group_by
:- GROUP BY lname ORDER BY 显示错误结果;
- 在没有 ANY() 的情况下,哪个是最便宜的聚合函数(请参阅已接受答案中的评论)。
- Oracle 还不够主流(警告:幽默,我不了解 Oracle)。
回答by KM.
try this:
尝试这个:
declare @YourTable table (id int, name varchar(10), email varchar(50))
INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')
SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
OUTPUT:
输出:
name email CountOf
---------- ----------- -----------
John John-email 2
sam sam-email 2
(2 row(s) affected)
if you want the IDs of the dups use this:
如果你想要重复的 ID 使用这个:
SELECT
y.id,y.name,y.email
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
OUTPUT:
输出:
id name email
----------- ---------- ------------
1 John John-email
2 John John-email
5 sam sam-email
6 sam sam-email
(4 row(s) affected)
to delete the duplicates try:
删除重复项尝试:
DELETE d
FROM @YourTable d
INNER JOIN (SELECT
y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
) dt2 ON d.id=dt2.id
WHERE dt2.RowRank!=1
SELECT * FROM @YourTable
OUTPUT:
输出:
id name email
----------- ---------- --------------
1 John John-email
3 fred John-email
4 fred fred-email
5 sam sam-email
(4 row(s) affected)
回答by Chris Van Opstal
Try this:
尝试这个:
SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )
回答by AncAinu
If you want to delete the duplicates, here's a much simpler way to do it than having to find even/odd rows into a triple sub-select:
如果您想删除重复项,这是一种比在三重子选择中查找偶数/奇数行更简单的方法:
SELECT id, name, email
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
And so to delete:
所以要删除:
DELETE FROM users
WHERE id IN (
SELECT id/*, name, email*/
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
)
Much more easier to read and understand IMHO
更容易阅读和理解恕我直言
Note:The only issue is that you have to execute the request until there is no rows deleted, since you delete only 1 of each duplicate each time
注意:唯一的问题是您必须执行请求直到没有行被删除,因为您每次只删除每个重复项中的 1 个
回答by gaurav singh
Try the following:
请尝试以下操作:
SELECT * FROM
(
SELECT Id, Name, Age, Comments, Row_Number() OVER(PARTITION BY Name, Age ORDER By Name)
AS Rank
FROM Customers
) AS B WHERE Rank>1
回答by PRADEEPTA VIRLLEY
SELECT name, email
FROM users
WHERE email in
(SELECT email FROM users
GROUP BY email
HAVING COUNT(*)>1)
回答by Indivision Dev
A little late to the party but I found a really cool workaround to finding all duplicate IDs:
聚会有点晚了,但我找到了一个非常酷的解决方法来查找所有重复的 ID:
SELECT GROUP_CONCAT( id )
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
回答by Tanmay Nehete
try this code
试试这个代码
WITH CTE AS
( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE
回答by Martin Silovsky
This selects/deletes all duplicate records except one record from each group of duplicates. So, the delete leaves all unique records + one record from each group of the duplicates.
这将从每组重复项中选择/删除除一个记录之外的所有重复记录。因此,删除会留下所有唯一记录 + 每组重复项中的一条记录。
Select duplicates:
选择重复项:
SELECT *
FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
Delete duplicates:
删除重复项:
DELETE FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
Be aware of larger amounts of records, it can cause performance problems.
请注意大量记录,它可能会导致性能问题。
回答by xDBA
In case you work with Oracle, this way would be preferable:
如果您使用 Oracle,这种方式会更可取:
create table my_users(id number, name varchar2(100), email varchar2(100));
insert into my_users values (1, 'John', '[email protected]');
insert into my_users values (2, 'Sam', '[email protected]');
insert into my_users values (3, 'Tom', '[email protected]');
insert into my_users values (4, 'Bob', '[email protected]');
insert into my_users values (5, 'Tom', '[email protected]');
commit;
select *
from my_users
where rowid not in (select min(rowid) from my_users group by name, email);