使用 SQL Server 在表中查找重复记录

Question

提问by Sahil

I am validating a table which has a transaction level data of an eCommerce site and find the exact errors.

我正在验证一个包含电子商务网站交易级别数据的表格，并找出确切的错误。

I want your help to find duplicate records in a 50 column table on SQL Server.

我希望您帮助在 SQL Server 上的 50 列表中查找重复记录。

Suppose my data is:

假设我的数据是：

OrderNo shoppername amountpayed city Item       
1       Sam         10          A    Iphone
1       Sam         10          A    Iphone--->>Duplication to be detected
1       Sam         5           A    Ipod
2       John        20          B    Macbook
3       John        25          B    Macbookair
4       Hyman        5           A    Ipod

Suppose I use the below query:

假设我使用以下查询：

Select shoppername,count(*) as cnt
from dbo.sales
having count(*) > 1
group by shoppername

will return me

会回来的

Sam  2
John 2

But I don't want to find duplicate just over 1 or 2 columns. I want to find the duplicate over all the columns together in my data. I want the result as:

但我不想找到超过 1 或 2 列的重复项。我想在我的数据中找到所有列的重复项。我希望结果为：

1       Sam         10          A    Iphone

Answer 1

回答by Sathya Narayanan

with x as   (select  *,rn = row_number()
            over(PARTITION BY OrderNo,item  order by OrderNo)
            from    #temp1)

select * from x
where rn > 1

you can remove duplicates by replacing select statement by

您可以通过替换 select 语句来删除重复项

delete x where rn > 1

Answer 2

回答by Eugene

SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt
FROM dbo.sales
GROUP BY OrderNo, shoppername, amountPayed, city, item
HAVING COUNT(*) > 1

Answer 3

回答by MUEKSH KUMAR

SQL> SELECT JOB,COUNT(JOB) FROM EMP GROUP BY JOB;

JOB       COUNT(JOB)
--------- ----------
ANALYST            2
CLERK              4
MANAGER            3
PRESIDENT          1
SALESMAN           4

Answer 4

回答by GolezTrol

Just add all fields to the query and remember to add them to Group By as well.

只需将所有字段添加到查询中，并记住将它们也添加到 Group By。

Select shoppername, a, b, amountpayed, item, count(*) as cnt
from dbo.sales
group by shoppername, a, b, amountpayed, item
having count(*) > 1

Answer 5

回答by Abhinav Singh

To get the list of multiple records use following command

要获取多条记录的列表，请使用以下命令

select field1,field2,field3, count(*)
  from table_name
  group by field1,field2,field3
  having count(*) > 1

Answer 6

回答by wqw

Try this instead

试试这个

SELECT MAX(shoppername), COUNT(*) AS cnt
FROM dbo.sales
GROUP BY CHECKSUM(*)
HAVING COUNT(*) > 1

Read about the CHECKSUMfunction first, as there can be duplicates.

首先阅读有关CHECKSUM函数的信息，因为可能存在重复项。

Answer 7

回答by Ragavendhran N

Try this

尝试这个

with T1 AS
(
SELECT LASTNAME, COUNT(1) AS 'COUNT' FROM Employees GROUP BY LastName HAVING  COUNT(1) > 1
)
SELECT E.*,T1.[COUNT] FROM Employees E INNER JOIN T1 ON T1.LastName = E.LastName

Answer 8

回答by sampath acharya

You can use below methods to find the output

您可以使用以下方法来查找输出

 with Ctec AS
 (
select *,Row_number() over(partition by name order by Name)Rnk
 from Table_A
)
select  Name from ctec
where rnk>1

select name from Table_A
 group by name
 having count(*)>1

Answer 9

回答by LONG

First of all, I doubt that the result it not accurate? Seem like there are Three 'Sam' from the original table. But it is not critical to the question.

首先，我怀疑这个结果不准确吗？原始表中似乎有三个“山姆”。但这对问题并不重要。

Then here we come for the question itself. Based on your table, the best way to show duplicate value is to use count(*)and Group byclause. The query would look like this

那么我们来这里是为了问题本身。根据您的表，显示重复值的最佳方法是使用count(*)andGroup by子句。查询看起来像这样

SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1

The reason is that all columns together from your table uniquely identified each record, which means the records will be considered as duplicate only when all values from each column are exactly the same, also you want to show all fields for duplicate records, so the group bywill not miss any column, otherwise yes because you can only selectcolumns that participate in the 'group by' clause.

原因是表中的所有列都唯一标识了每条记录，这意味着只有当每列中的所有值完全相同时，记录才会被视为重复，而且您希望显示重复记录的所有字段，因此group by将不要错过任何列，否则是因为您只能select参与“group by”子句的列。

Now I would like to give you any example for With...Row_Number()Over(...), which is using table expression together with Row_Number function.

现在我想给你举个例子With...Row_Number()Over(...)，它使用表表达式和 Row_Number 函数。

Suppose you have a nearly same table but with one extra column called Shipping Date, and the value may change even the rest are the same. Here it is:

假设您有一个几乎相同的表，但有一个名为Shipping Date 的额外列，即使其余列相同，该值也可能会发生变化。这里是：

OrderNo shoppername amountpayed city Item Shipping Date 1 Sam 10 A Iphone 2016-01-01 1 Sam 10 A Iphone 2016-02-02 1 Sam 5 A Ipod 2016-03-03 2 John 20 B Macbook 2016-04-04 3 John 25 B Macbookair 2016-05-05 4 Hyman 5 A Ipod 2016-06-06

Notice that row# 2 is not a duplicate one if you still take all columns as a unit. But what if you want to treat them as duplicate as well in this case? You should use With...Row_Number()Over(...), and the query would look like this:

请注意，如果您仍将所有列作为一个单元，则第 2 行不是重复的。但是，如果您想在这种情况下也将它们视为重复项怎么办？您应该使用With...Row_Number()Over(...)，查询将如下所示：

WITH TABLEEXPRESSION AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate FROM dbo.sales) SELECT * FROM TABLEEXPRESSION WHERE Identifier !=1 --or use '>1'

The above query will give result together with Shipping Date, for example:

上面的查询将给出结果和发货日期，例如：

OrderNo shoppername amountpayed city Item Shipping Date Identifier 1 Sam 10 A Iphone 2016-02-02 2

Note this one is different from the one with 2016-01-01, and the reason why 2016-02-02 has been filtered out is PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier, and Shipping Date is NOT one of the column that need to be took care of for duplicate records, which means the one with 2016-02-02 still could be a perfect result for your question.

注意这个和 2016-01-01 不同，2016-02-02 被过滤掉的原因是PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier，并且 Shipping Date 不是需要处理重复记录的列之一，这意味着 2016-02-02 仍然可能是您问题的完美结果。

Now summarize it little bit, using count(*)and Group byclause together is the best choice when you only want to show all columns from Group byclause as the result, otherwise you will miss the columns that do not participate in group by.

现在稍微总结一下，当您只想显示来自子句的所有列作为结果时，使用count(*)和Group by子句是最佳选择Group by，否则您将错过不参与的列group by。

While For With...Row_Number()Over(...), it is suitable in every scenario that you want to find duplicate records, however, it is little bit complicated to write the query and little bit over engineered compared to the former one.

虽然 For With...Row_Number()Over(...)，它适用于您想要查找重复记录的任何场景，但是，与前一种相比，编写查询有点复杂，并且有点过度设计。

If your purpose is to delete duplicate records from table, you have to use the later WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHEREone.

如果您的目的是从表中删除重复记录，则必须使用后WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE一个记录。

Hope this helps!

希望这可以帮助！

Answer 10

回答by user5758159

with x as (
select shoppername,count(shoppername)
              from sales
              having count(shoppername)>1
            group by shoppername)
select t.* from x,win_gp_pin1510 t
where x.shoppername=t.shoppername
order by t.shoppername

使用 SQL Server 在表中查找重复记录

提问by Sahil

回答by Sathya Narayanan

回答by Eugene

回答by MUEKSH KUMAR

回答by GolezTrol

回答by Abhinav Singh

回答by wqw

回答by Ragavendhran N

回答by sampath acharya

回答by LONG

回答by user5758159

相关推荐

最近更新

标签

使用 SQL Server 在表中查找重复记录

提问by Sahil

回答by Sathya Narayanan

回答by Eugene

回答by MUEKSH KUMAR

回答by GolezTrol

回答by Abhinav Singh

回答by wqw

回答by Ragavendhran N

回答by sampath acharya

回答by LONG

回答by user5758159

相关推荐

SQL SQL中当前日期两周后的日期

在 SQL Server 2008 上运行“ALTER DATABASE SET SINGLE_USER”语句所需的权限

SQL 此处不允许使用虚拟列

SQL 如何在oracle中显示用户的所有权限？

相关推荐

最近更新

标签