SQL 选择语句以查找某些字段上的重复项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4434118/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 08:33:30  来源:igfitidea点击:

Select statement to find duplicates on certain fields

sqlsql-servertsqlsql-server-2008

提问by JOE SKEET

Can you help me with SQL statements to find duplicates on multiple fields?

你能帮我用 SQL 语句在多个字段上查找重复项吗?

For example, in pseudo code:

例如,在伪代码中:

select count(field1,field2,field3) 
from table 
where the combination of field1, field2, field3 occurs multiple times

and from the above statement if there are multiple occurrencesI would like to select every record except the first one.

从上面的语句中,如果出现多次,我想选择除第一条以外的每条记录

回答by Rajesh Chamarthi

To get the list of fields for which there are multiple records, you can use..

要获取有多个记录的字段列表,您可以使用..

select field1,field2,field3, count(*)
  from table_name
  group by field1,field2,field3
  having count(*) > 1

Check this link for more information on how to delete the rows.

检查此链接以获取有关如何删除行的更多信息。

http://support.microsoft.com/kb/139444

http://support.microsoft.com/kb/139444

Edit : As the other users mentioned, there should be a criterion for deciding how you define "first rows" before you use the approach in the link above. Based on that you'll need to use an order by clause and a sub query if needed. If you can post some sample data, it would really help.

编辑:正如其他用户所提到的,在使用上面链接中的方法之前,应该有一个标准来决定如何定义“第一行”。基于此,您需要使用 order by 子句和子查询(如果需要)。如果您可以发布一些示例数据,那将非常有帮助。

回答by Heinzi

You mention "the first one", so I assume that you have some kind of ordering on your data. Let's assume that your data is ordered by some field ID.

您提到了“第一个”,因此我假设您对数据进行了某种排序。假设您的数据按某个字段排序ID

This SQL should get you the duplicate entries except for the first one. It basically selects all rows for which another row with (a) the same fields and (b) a lower ID exists. Performance won't be great, but it might solve your problem.

此 SQL 应该为您提供除第一个之外的重复条目。它基本上选择具有 (a) 相同字段和 (b) 较低 ID 的另一行的所有行。性能不会很好,但它可能会解决您的问题。

SELECT A.ID, A.field1, A.field2, A.field3
  FROM myTable A
 WHERE EXISTS (SELECT B.ID
                 FROM myTable B
                WHERE B.field1 = A.field1
                  AND B.field2 = A.field2
                  AND B.field3 = A.field3
                  AND B.ID < A.ID)

回答by Nick Vaccaro

This is a fun solution with SQL Server 2005 that I like. I'm going to assume that by "for every record except for the first one", you mean that there is another "id" column that we can use to identify which row is "first".

这是我喜欢的 SQL Server 2005 的有趣解决方案。我将假设“对于除第一条以外的每条记录”,您的意思是还有另一个“id”列,我们可以使用它来识别哪一行是“第一条”。

SELECT id
    , field1
    , field2
    , field3
FROM
(
    SELECT id
        , field1
        , field2
        , field3
        , RANK() OVER (PARTITION BY field1, field2, field3 ORDER BY id ASC) AS [rank]
    FROM table_name
) a
WHERE [rank] > 1

回答by manoj Verma

To see duplicate values:

查看重复值:

with MYCTE  as (
    select row_number() over ( partition by name  order by name) rown, *
    from tmptest  
    ) 
select * from MYCTE where rown <=1

回答by Bradford Hoagland

If you're using SQL Server 2005 or later (and the tags for your question indicate SQL Server 2008), you can use ranking functions to return the duplicate records after the first one if using joins is less desirable or impractical for some reason. The following example shows this in action, where it also works with null values in the columns examined.

如果您使用的是 SQL Server 2005 或更高版本(并且您问题的标签指示 SQL Server 2008),并且由于某种原因使用联接不太理想或不切实际,您可以使用排名函数返回第一个之后的重复记录。下面的示例显示了这一点,它也适用于检查的列中的空值。

create table Table1 (
 Field1 int,
 Field2 int,
 Field3 int,
 Field4 int 
)

insert  Table1 
values    (1,1,1,1)
        , (1,1,1,2)
        , (1,1,1,3)
        , (2,2,2,1)
        , (3,3,3,1)
        , (3,3,3,2)
        , (null, null, 2, 1)
        , (null, null, 2, 3)

select    *
from     (select      Field1
                    , Field2
                    , Field3
                    , Field4
                    , row_number() over (partition by   Field1
                                                      , Field2
                                                      , Field3
                                         order by       Field4) as occurrence
          from      Table1) x
where     occurrence > 1

Notice after running this example that the first record out of every "group" is excluded, and that records with null values are handled properly.

请注意,运行此示例后,排除了每个“组”中的第一条记录,并且正确处理了具有空值的记录。

If you don't have a column available to order the records within a group, you can use the partition-by columns as the order-by columns.

如果没有可用于对组内的记录进行排序的列,则可以使用分区依据列作为排序依据列。

回答by Mr.X

CREATE TABLE #tmp
(
    sizeId Varchar(MAX)
)

INSERT  #tmp 
    VALUES ('44'),
        ('44,45,46'),
        ('44,45,46'),
        ('44,45,46'),
        ('44,45,46'),
        ('44,45,46'),
        ('44,45,46')


SELECT * FROM #tmp
DECLARE @SqlStr VARCHAR(MAX)

SELECT @SqlStr = STUFF((SELECT ',' + sizeId
              FROM #tmp
              ORDER BY sizeId
              FOR XML PATH('')), 1, 1, '') 


SELECT TOP 1 * FROM (
select items, count(*)AS Occurrence
  FROM dbo.Split(@SqlStr,',')
  group by items
  having count(*) > 1
  )K
  ORDER BY K.Occurrence DESC    

回答by daryosh setorg

try this query to have sepratley count of each SELECT statements :

尝试使用此查询对每个 SELECT 语句进行单独计数:

select field1,count(field1) as field1Count,field2,count(field2) as field2Counts,field3, count(field3) as field3Counts
from table_name
group by field1,field2,field3
having count(*) > 1