使用 SQL Server 在表中查找重复记录
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9849846/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find duplicate records in a table using SQL Server
提问by Sahil
I am validating a table which has a transaction level data of an eCommerce site and find the exact errors.
我正在验证一个包含电子商务网站交易级别数据的表格,并找出确切的错误。
I want your help to find duplicate records in a 50 column table on SQL Server.
我希望您帮助在 SQL Server 上的 50 列表中查找重复记录。
Suppose my data is:
假设我的数据是:
OrderNo shoppername amountpayed city Item
1 Sam 10 A Iphone
1 Sam 10 A Iphone--->>Duplication to be detected
1 Sam 5 A Ipod
2 John 20 B Macbook
3 John 25 B Macbookair
4 Hyman 5 A Ipod
Suppose I use the below query:
假设我使用以下查询:
Select shoppername,count(*) as cnt
from dbo.sales
having count(*) > 1
group by shoppername
will return me
会回来的
Sam 2
John 2
But I don't want to find duplicate just over 1 or 2 columns. I want to find the duplicate over all the columns together in my data. I want the result as:
但我不想找到超过 1 或 2 列的重复项。我想在我的数据中找到所有列的重复项。我希望结果为:
1 Sam 10 A Iphone
回答by Sathya Narayanan
with x as (select *,rn = row_number()
over(PARTITION BY OrderNo,item order by OrderNo)
from #temp1)
select * from x
where rn > 1
you can remove duplicates by replacing select statement by
您可以通过替换 select 语句来删除重复项
delete x where rn > 1
回答by Eugene
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt
FROM dbo.sales
GROUP BY OrderNo, shoppername, amountPayed, city, item
HAVING COUNT(*) > 1
回答by MUEKSH KUMAR
SQL> SELECT JOB,COUNT(JOB) FROM EMP GROUP BY JOB;
JOB COUNT(JOB)
--------- ----------
ANALYST 2
CLERK 4
MANAGER 3
PRESIDENT 1
SALESMAN 4
回答by GolezTrol
Just add all fields to the query and remember to add them to Group By as well.
只需将所有字段添加到查询中,并记住将它们也添加到 Group By。
Select shoppername, a, b, amountpayed, item, count(*) as cnt
from dbo.sales
group by shoppername, a, b, amountpayed, item
having count(*) > 1
回答by Abhinav Singh
To get the list of multiple records use following command
要获取多条记录的列表,请使用以下命令
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
回答by wqw
回答by Ragavendhran N
Try this
尝试这个
with T1 AS
(
SELECT LASTNAME, COUNT(1) AS 'COUNT' FROM Employees GROUP BY LastName HAVING COUNT(1) > 1
)
SELECT E.*,T1.[COUNT] FROM Employees E INNER JOIN T1 ON T1.LastName = E.LastName
回答by sampath acharya
You can use below methods to find the output
您可以使用以下方法来查找输出
with Ctec AS
(
select *,Row_number() over(partition by name order by Name)Rnk
from Table_A
)
select Name from ctec
where rnk>1
select name from Table_A
group by name
having count(*)>1
回答by LONG
First of all, I doubt that the result it not accurate? Seem like there are Three 'Sam' from the original table. But it is not critical to the question.
首先,我怀疑这个结果不准确吗?原始表中似乎有三个“山姆”。但这对问题并不重要。
Then here we come for the question itself. Based on your table, the best way to show duplicate value is to use count(*)
and Group by
clause. The query would look like this
那么我们来这里是为了问题本身。根据您的表,显示重复值的最佳方法是使用count(*)
andGroup by
子句。查询看起来像这样
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1
The reason is that all columns together from your table uniquely identified each record, which means the records will be considered as duplicate only when all values from each column are exactly the same, also you want to show all fields for duplicate records, so the group by
will not miss any column, otherwise yes because you can only select
columns that participate in the 'group by' clause.
原因是表中的所有列都唯一标识了每条记录,这意味着只有当每列中的所有值完全相同时,记录才会被视为重复,而且您希望显示重复记录的所有字段,因此group by
将不要错过任何列,否则是因为您只能select
参与“group by”子句的列。
Now I would like to give you any example for With...Row_Number()Over(...)
, which is using table expression together with Row_Number function.
现在我想给你举个例子With...Row_Number()Over(...)
,它使用表表达式和 Row_Number 函数。
Suppose you have a nearly same table but with one extra column called Shipping Date, and the value may change even the rest are the same. Here it is:
假设您有一个几乎相同的表,但有一个名为Shipping Date 的额外列,即使其余列相同,该值也可能会发生变化。这里是:
OrderNo shoppername amountpayed city Item Shipping Date
1 Sam 10 A Iphone 2016-01-01
1 Sam 10 A Iphone 2016-02-02
1 Sam 5 A Ipod 2016-03-03
2 John 20 B Macbook 2016-04-04
3 John 25 B Macbookair 2016-05-05
4 Hyman 5 A Ipod 2016-06-06
OrderNo shoppername amountpayed city Item Shipping Date
1 Sam 10 A Iphone 2016-01-01
1 Sam 10 A Iphone 2016-02-02
1 Sam 5 A Ipod 2016-03-03
2 John 20 B Macbook 2016-04-04
3 John 25 B Macbookair 2016-05-05
4 Hyman 5 A Ipod 2016-06-06
Notice that row# 2 is not a duplicate one if you still take all columns as a unit. But what if you want to treat them as duplicate as well in this case? You should use With...Row_Number()Over(...)
, and the query would look like this:
请注意,如果您仍将所有列作为一个单元,则第 2 行不是重复的。但是,如果您想在这种情况下也将它们视为重复项怎么办?您应该使用With...Row_Number()Over(...)
,查询将如下所示:
WITH TABLEEXPRESSION
AS
(SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate
FROM dbo.sales)
SELECT * FROM TABLEEXPRESSION
WHERE Identifier !=1 --or use '>1'
WITH TABLEEXPRESSION
AS
(SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate
FROM dbo.sales)
SELECT * FROM TABLEEXPRESSION
WHERE Identifier !=1 --or use '>1'
The above query will give result together with Shipping Date, for example:
上面的查询将给出结果和发货日期,例如:
OrderNo shoppername amountpayed city Item Shipping Date Identifier
1 Sam 10 A Iphone 2016-02-02 2
OrderNo shoppername amountpayed city Item Shipping Date Identifier
1 Sam 10 A Iphone 2016-02-02 2
Note this one is different from the one with 2016-01-01, and the reason why 2016-02-02 has been filtered out is PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier
, and Shipping Date is NOT one of the column that need to be took care of for duplicate records, which means the one with 2016-02-02 still could be a perfect result for your question.
注意这个和 2016-01-01 不同,2016-02-02 被过滤掉的原因是PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier
,并且 Shipping Date 不是需要处理重复记录的列之一,这意味着 2016-02-02 仍然可能是您问题的完美结果。
Now summarize it little bit, using count(*)
and Group by
clause together is the best choice when you only want to show all columns from Group by
clause as the result, otherwise you will miss the columns that do not participate in group by
.
现在稍微总结一下,当您只想显示来自子句的所有列作为结果时,使用count(*)
和Group by
子句是最佳选择Group by
,否则您将错过不参与的列group by
。
While For With...Row_Number()Over(...)
, it is suitable in every scenario that you want to find duplicate records, however, it is little bit complicated to write the query and little bit over engineered compared to the former one.
虽然 For With...Row_Number()Over(...)
,它适用于您想要查找重复记录的任何场景,但是,与前一种相比,编写查询有点复杂,并且有点过度设计。
If your purpose is to delete duplicate records from table, you have to use the later WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE
one.
如果您的目的是从表中删除重复记录,则必须使用后WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE
一个记录。
Hope this helps!
希望这可以帮助!
回答by user5758159
with x as (
select shoppername,count(shoppername)
from sales
having count(shoppername)>1
group by shoppername)
select t.* from x,win_gp_pin1510 t
where x.shoppername=t.shoppername
order by t.shoppername