MySQL 如何删除 SQL Server 中的重复行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18390574/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 18:37:42  来源:igfitidea点击:

How to delete duplicate rows in SQL Server?

mysqlsqlsql-server-2008duplicatessql-delete

提问by Fearghal

How can I delete duplicate rowswhere no unique row idexists?

如何删除unique row id存在的重复行

My table is

我的桌子是

col1  col2 col3 col4 col5 col6 col7
john  1    1    1    1    1    1 
john  1    1    1    1    1    1
sally 2    2    2    2    2    2
sally 2    2    2    2    2    2

I want to be left with the following after the duplicate removal:

删除重复后,我想留下以下内容:

john  1    1    1    1    1    1
sally 2    2    2    2    2    2

I've tried a few queries but I think they depend on having a row id as I don't get the desired result. For example:

我尝试了一些查询,但我认为它们取决于行 id,因为我没有得到想要的结果。例如:

DELETE
FROM table
WHERE col1 IN (
    SELECT id
    FROM table
    GROUP BY id
    HAVING (COUNT(col1) > 1)
)

回答by Tim Schmelter

I like CTEs and ROW_NUMBERas the two combined allow us to see which rows are deleted (or updated), therefore just change the DELETE FROM CTE...to SELECT * FROM CTE:

我喜欢 CTE,ROW_NUMBER因为两者的结合使我们能够看到哪些行被删除(或更新),因此只需将其更改DELETE FROM CTE...SELECT * FROM CTE

WITH CTE AS(
   SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],
       RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)
   FROM dbo.Table1
)
DELETE FROM CTE WHERE RN > 1

DEMO(result is different; I assume that it's due to a typo on your part)

DEMO(结果有所不同;我认为这是由于您的拼写错误造成的)

COL1    COL2    COL3    COL4    COL5    COL6    COL7
john    1        1       1       1       1       1
sally   2        2       2       2       2       2

This example determines duplicates by a single column col1because of the PARTITION BY col1. If you want to include multiple columns simply add them to the PARTITION BY:

此示例通过单个列确定重复项,col1因为PARTITION BY col1. 如果您想包含多个列,只需将它们添加到PARTITION BY

ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)

回答by Shamseer K

I would prefer CTE for deleting duplicate rows from sql server table

我更喜欢 CTE 从 sql server 表中删除重复的行

strongly recommend to follow this article ::http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/

强烈建议关注这篇文章 :: http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/

by keeping original

通过保持原始

WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
FROM MyTable
)

DELETE FROM CTE WHERE RN<>1

without keeping original

不保持原样

WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
FROM MyTable)
?
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)

回答by Aamir

Without using CTEand ROW_NUMBER()you can just delete the records just by using group by with MAXfunction here is and example

无需使用CTEROW_NUMBER()您只需使用 group by withMAX函数即可删除记录,这里是和示例

DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyDuplicateTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

回答by Shoja Hamid

DELETE from search
where id not in (
   select min(id) from search
   group by url
   having count(*)=1

   union

   SELECT min(id) FROM search
   group by url
   having count(*) > 1
)

回答by Rhys

If you have no references, like foreign keys, you can do this. I do it a lot when testing proofs of concept and the test data gets duplicated.

如果您没有引用(如外键),则可以执行此操作。在测试概念证明并且测试数据被复制时,我经常这样做。

SELECT DISTINCT [col1],[col2],[col3],[col4],[col5],[col6],[col7]

INTO [newTable]

Go into the object explorer and delete the old table.

进入对象资源管理器并删除旧表。

Rename the new table with the old table's name.

用旧表的名称重命名新表。

回答by Jithin Shaji

Please see the below way of deletion too.

请参阅下面的删除方式。

Declare @table table
(col1 varchar(10),col2 int,col3 int, col4 int, col5 int, col6 int, col7 int)
Insert into @table values 
('john',1,1,1,1,1,1),
('john',1,1,1,1,1,1),
('sally',2,2,2,2,2,2),
('sally',2,2,2,2,2,2)

Created a sample table named @tableand loaded it with given data.

创建了一个示例表@table,并使用给定的数据加载它。

enter image description here

在此处输入图片说明

Delete  aliasName from (
Select  *,
        ROW_NUMBER() over (Partition by col1,col2,col3,col4,col5,col6,col7 order by col1) as rowNumber
From    @table) aliasName 
Where   rowNumber > 1

Select * from @table

enter image description here

在此处输入图片说明

Note: If you are giving all columns in the Partition bypart, then order bydo not have much significance.

注意:如果您在Partition by零件中给出所有列,则order by没有太大意义。

I know, the question is asked three years ago, and my answer is another version of what Tim has posted, But posting just incase it is helpful for anyone.

我知道,这个问题是三年前提出的,我的回答是 Tim 发布的另一个版本,但发布只是为了对任何人都有帮助。

回答by oabarca

Microsoft has a vey ry neat guide on how to remove duplicates. Check out http://support.microsoft.com/kb/139444

Microsoft 有一个关于如何删除重复项的非常简洁的指南。查看 http://support.microsoft.com/kb/139444

In brief, here is the easiest way to delete duplicates when you have just a few rows to delete:

简而言之,当您只有几行要删除时,这是删除重复项的最简单方法:

SET rowcount 1;
DELETE FROM t1 WHERE myprimarykey=1;

myprimarykeyis the identifier for the row.

myprimarykey是行的标识符。

I set rowcountto 1 because I only had two rows that were duplicated. If I had had 3 rows duplicated then I would have set rowcountto 2 so that it deletes the first two that it sees and only leaves one in table t1.

我将rowcount设置为 1,因为我只有两行重复。如果我复制了 3 行,那么我会将rowcount设置为 2,以便它删除它看到的前两行,并且只在表 t1 中留下一个。

Hope it helps anyone

希望它可以帮助任何人

回答by Fezal halai

Try to Use:

尝试使用:

SELECT linkorder
    ,Row_Number() OVER (
        PARTITION BY linkorder ORDER BY linkorder DESC
        ) AS RowNum
FROM u_links

enter image description here

在此处输入图片说明

回答by Moshe Taieb

After trying the suggested solution above, that works for small medium tables. I can suggest that solution for very large tables. since it runs in iterations.

在尝试了上面建议的解决方案后,这适用于中小型表格。我可以为非常大的表建议该解决方案。因为它在迭代中运行。

  1. Drop all dependency views of the LargeSourceTable
  2. you can find the dependecies by using sql managment studio, right click on the table and click "View Dependencies"
  3. Rename the table:
  4. sp_rename 'LargeSourceTable', 'LargeSourceTable_Temp'; GO
  5. Create the LargeSourceTableagain, but now, add a primary key with all the columns that define the duplications add WITH (IGNORE_DUP_KEY = ON)
  6. For example:

    CREATE TABLE [dbo].[LargeSourceTable] ( ID int IDENTITY(1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [Column1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY (Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); GO

  7. Create again the views that you dropped in the first place for the new created table

  8. Now, Run the following sql script, you will see the results in 1,000,000 rows per page, you can change the row number per page to see the results more often.

  9. Note, that I set the IDENTITY_INSERTon and off because one the columns contains auto incremental id, which I'm also copying

  1. 删除所有依赖视图 LargeSourceTable
  2. 您可以使用 sql 管理工作室找到依赖项,右键单击表并单击“查看依赖项”
  3. 重命名表:
  4. sp_rename 'LargeSourceTable', 'LargeSourceTable_Temp'; GO
  5. LargeSourceTable再次创建,但现在,添加一个主键,其中包含定义重复项的所有列添加WITH (IGNORE_DUP_KEY = ON)
  6. 例如:

    CREATE TABLE [dbo].[LargeSourceTable] ( ID int IDENTITY(1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [Column1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY (Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); GO

  7. 再次为新创建的表创建您首先删除的视图

  8. 现在,运行以下 sql 脚本,您将看到每页 1,000,000 行的结果,您可以更改每页的行数以更频繁地查看结果。

  9. 请注意,我设置了IDENTITY_INSERT打开和关闭,因为其中一列包含自动增量 ID,我也在复制

SET IDENTITY_INSERT LargeSourceTable ON DECLARE @PageNumber AS INT, @RowspPage AS INT DECLARE @TotalRows AS INT declare @dt varchar(19) SET @PageNumber = 0 SET @RowspPage = 1000000 select @TotalRows = count (*) from LargeSourceTable_TEMP

SET IDENTITY_INSERT LargeSourceTable ON DECLARE @PageNumber AS INT, @RowspPage AS INT DECLARE @TotalRows AS INT declare @dt varchar(19) SET @PageNumber = 0 SET @RowspPage = 1000000 select @TotalRows = count (*) from LargeSourceTable_TEMP

While ((@PageNumber - 1) * @RowspPage < @TotalRows )
Begin
    begin transaction tran_inner
        ; with cte as
        (
            SELECT * FROM LargeSourceTable_TEMP ORDER BY ID
            OFFSET ((@PageNumber) * @RowspPage) ROWS
            FETCH NEXT @RowspPage ROWS ONLY
        )

        INSERT INTO LargeSourceTable 
        (
             ID                     
            ,[CreateDate]       
            ,[Column1]   
            ,[Column2] 
            ,[Column3]       
        )       
        select 
             ID                     
            ,[CreateDate]       
            ,[Column1]   
            ,[Column2] 
            ,[Column3]       
        from cte

    commit transaction tran_inner

    PRINT 'Page: ' + convert(varchar(10), @PageNumber)
    PRINT 'Transfered: ' + convert(varchar(20), @PageNumber * @RowspPage)
    PRINT 'Of: ' + convert(varchar(20), @TotalRows)

    SELECT @dt = convert(varchar(19), getdate(), 121)
    RAISERROR('Inserted on: %s', 0, 1, @dt) WITH NOWAIT
    SET @PageNumber = @PageNumber + 1
End

SET IDENTITY_INSERT LargeSourceTable OFF

SET IDENTITY_INSERT LargeSourceTable OFF

回答by Bashirpour

There are two solutions in mysql:

有两种解决方案mysql

A)Delete duplicate rows using DELETE JOINstatement

A)使用DELETE JOIN语句删除重复行

DELETE t1 FROM contacts t1
INNER JOIN contacts t2 
WHERE 
    t1.id < t2.id AND 
    t1.email = t2.email;

This query references the contacts table twice, therefore, it uses the table alias t1and t2.

此查询两次引用联系人表,因此,它使用表别名t1t2

The output is:

输出是:

1 Query OK, 4 rows affected (0.10 sec)

1 次查询正常,4 行受影响(0.10 秒)

In case you want to delete duplicate rows and keep the lowest id, you can use the following statement:

如果要删除重复行并保留lowest id,可以使用以下语句:

DELETE c1 FROM contacts c1
INNER JOIN contacts c2 
WHERE
    c1.id > c2.id AND 
    c1.email = c2.email;

   

   

B)Delete duplicate rows using an intermediate table

B)使用中间表删除重复行

The following shows the steps for removing duplicate rows using an intermediate table:

下面显示了使用中间表删除重复行的步骤:

    1. Create a new table with the structure the same as the original table that you want to delete duplicate rows.

    1.新建一个与原表结构相同的表,要删除重复行。

    2. Insert distinct rows from the original table to the immediate table.

    2. 将原始表中的不同行插入到直接表中。

    3. Insert distinct rows from the original table to the immediate table.

    3. 将原始表中的不同行插入到直接表中。

 

 

Step 1. Create a new table whose structure is the same as the original table:

Step 1. 创建一个与原表结构相同的新表:

CREATE TABLE source_copy LIKE source;

Step 2. Insert distinct rows from the original table to the new table:

步骤 2. 将原始表中的不同行插入到新表中:

INSERT INTO source_copy
SELECT * FROM source
GROUP BY col; -- column that has duplicate values

Step 3. drop the original table and rename the immediate table to the original one

步骤 3. 删除原始表并将直接表重命名为原始表

DROP TABLE source;
ALTER TABLE source_copy RENAME TO source;

Source: http://www.mysqltutorial.org/mysql-delete-duplicate-rows/

来源:http: //www.mysqltutorial.org/mysql-delete-duplicate-rows/