MySQL 如何删除 SQL Server 中的重复行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18390574/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to delete duplicate rows in SQL Server?
提问by Fearghal
How can I delete duplicate rowswhere no unique row id
exists?
如何删除不unique row id
存在的重复行?
My table is
我的桌子是
col1 col2 col3 col4 col5 col6 col7
john 1 1 1 1 1 1
john 1 1 1 1 1 1
sally 2 2 2 2 2 2
sally 2 2 2 2 2 2
I want to be left with the following after the duplicate removal:
删除重复后,我想留下以下内容:
john 1 1 1 1 1 1
sally 2 2 2 2 2 2
I've tried a few queries but I think they depend on having a row id as I don't get the desired result. For example:
我尝试了一些查询,但我认为它们取决于行 id,因为我没有得到想要的结果。例如:
DELETE
FROM table
WHERE col1 IN (
SELECT id
FROM table
GROUP BY id
HAVING (COUNT(col1) > 1)
)
回答by Tim Schmelter
I like CTEs and ROW_NUMBER
as the two combined allow us to see which rows are deleted (or updated), therefore just change the DELETE FROM CTE...
to SELECT * FROM CTE
:
我喜欢 CTE,ROW_NUMBER
因为两者的结合使我们能够看到哪些行被删除(或更新),因此只需将其更改DELETE FROM CTE...
为SELECT * FROM CTE
:
WITH CTE AS(
SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],
RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)
FROM dbo.Table1
)
DELETE FROM CTE WHERE RN > 1
DEMO(result is different; I assume that it's due to a typo on your part)
DEMO(结果有所不同;我认为这是由于您的拼写错误造成的)
COL1 COL2 COL3 COL4 COL5 COL6 COL7
john 1 1 1 1 1 1
sally 2 2 2 2 2 2
This example determines duplicates by a single column col1
because of the PARTITION BY col1
. If you want to include multiple columns simply add them to the PARTITION BY
:
此示例通过单个列确定重复项,col1
因为PARTITION BY col1
. 如果您想包含多个列,只需将它们添加到PARTITION BY
:
ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)
回答by Shamseer K
I would prefer CTE for deleting duplicate rows from sql server table
我更喜欢 CTE 从 sql server 表中删除重复的行
strongly recommend to follow this article ::http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/
强烈建议关注这篇文章 :: http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/
by keeping original
通过保持原始
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
FROM MyTable
)
DELETE FROM CTE WHERE RN<>1
without keeping original
不保持原样
WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
FROM MyTable)
?
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)
回答by Aamir
Without using CTE
and ROW_NUMBER()
you can just delete the records just by using group by with MAX
function here is and example
无需使用CTE
,ROW_NUMBER()
您只需使用 group by withMAX
函数即可删除记录,这里是和示例
DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyDuplicateTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
回答by Shoja Hamid
DELETE from search
where id not in (
select min(id) from search
group by url
having count(*)=1
union
SELECT min(id) FROM search
group by url
having count(*) > 1
)
回答by Rhys
If you have no references, like foreign keys, you can do this. I do it a lot when testing proofs of concept and the test data gets duplicated.
如果您没有引用(如外键),则可以执行此操作。在测试概念证明并且测试数据被复制时,我经常这样做。
SELECT DISTINCT [col1],[col2],[col3],[col4],[col5],[col6],[col7]
INTO [newTable]
Go into the object explorer and delete the old table.
进入对象资源管理器并删除旧表。
Rename the new table with the old table's name.
用旧表的名称重命名新表。
回答by Jithin Shaji
Please see the below way of deletion too.
请参阅下面的删除方式。
Declare @table table
(col1 varchar(10),col2 int,col3 int, col4 int, col5 int, col6 int, col7 int)
Insert into @table values
('john',1,1,1,1,1,1),
('john',1,1,1,1,1,1),
('sally',2,2,2,2,2,2),
('sally',2,2,2,2,2,2)
Created a sample table named @table
and loaded it with given data.
创建了一个示例表@table
,并使用给定的数据加载它。
Delete aliasName from (
Select *,
ROW_NUMBER() over (Partition by col1,col2,col3,col4,col5,col6,col7 order by col1) as rowNumber
From @table) aliasName
Where rowNumber > 1
Select * from @table
Note: If you are giving all columns in the Partition by
part, then order by
do not have much significance.
注意:如果您在Partition by
零件中给出所有列,则order by
没有太大意义。
I know, the question is asked three years ago, and my answer is another version of what Tim has posted, But posting just incase it is helpful for anyone.
我知道,这个问题是三年前提出的,我的回答是 Tim 发布的另一个版本,但发布只是为了对任何人都有帮助。
回答by oabarca
Microsoft has a vey ry neat guide on how to remove duplicates. Check out http://support.microsoft.com/kb/139444
Microsoft 有一个关于如何删除重复项的非常简洁的指南。查看 http://support.microsoft.com/kb/139444
In brief, here is the easiest way to delete duplicates when you have just a few rows to delete:
简而言之,当您只有几行要删除时,这是删除重复项的最简单方法:
SET rowcount 1;
DELETE FROM t1 WHERE myprimarykey=1;
myprimarykeyis the identifier for the row.
myprimarykey是行的标识符。
I set rowcountto 1 because I only had two rows that were duplicated. If I had had 3 rows duplicated then I would have set rowcountto 2 so that it deletes the first two that it sees and only leaves one in table t1.
我将rowcount设置为 1,因为我只有两行重复。如果我复制了 3 行,那么我会将rowcount设置为 2,以便它删除它看到的前两行,并且只在表 t1 中留下一个。
Hope it helps anyone
希望它可以帮助任何人
回答by Fezal halai
回答by Moshe Taieb
After trying the suggested solution above, that works for small medium tables. I can suggest that solution for very large tables. since it runs in iterations.
在尝试了上面建议的解决方案后,这适用于中小型表格。我可以为非常大的表建议该解决方案。因为它在迭代中运行。
- Drop all dependency views of the
LargeSourceTable
- you can find the dependecies by using sql managment studio, right click on the table and click "View Dependencies"
- Rename the table:
sp_rename 'LargeSourceTable', 'LargeSourceTable_Temp'; GO
- Create the
LargeSourceTable
again, but now, add a primary key with all the columns that define the duplications addWITH (IGNORE_DUP_KEY = ON)
For example:
CREATE TABLE [dbo].[LargeSourceTable] ( ID int IDENTITY(1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [Column1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY (Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); GO
Create again the views that you dropped in the first place for the new created table
Now, Run the following sql script, you will see the results in 1,000,000 rows per page, you can change the row number per page to see the results more often.
Note, that I set the
IDENTITY_INSERT
on and off because one the columns contains auto incremental id, which I'm also copying
- 删除所有依赖视图
LargeSourceTable
- 您可以使用 sql 管理工作室找到依赖项,右键单击表并单击“查看依赖项”
- 重命名表:
sp_rename 'LargeSourceTable', 'LargeSourceTable_Temp'; GO
LargeSourceTable
再次创建,但现在,添加一个主键,其中包含定义重复项的所有列添加WITH (IGNORE_DUP_KEY = ON)
例如:
CREATE TABLE [dbo].[LargeSourceTable] ( ID int IDENTITY(1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [Column1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY (Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); GO
再次为新创建的表创建您首先删除的视图
现在,运行以下 sql 脚本,您将看到每页 1,000,000 行的结果,您可以更改每页的行数以更频繁地查看结果。
请注意,我设置了
IDENTITY_INSERT
打开和关闭,因为其中一列包含自动增量 ID,我也在复制
SET IDENTITY_INSERT LargeSourceTable ON
DECLARE @PageNumber AS INT, @RowspPage AS INT
DECLARE @TotalRows AS INT
declare @dt varchar(19)
SET @PageNumber = 0
SET @RowspPage = 1000000
select @TotalRows = count (*) from LargeSourceTable_TEMP
SET IDENTITY_INSERT LargeSourceTable ON
DECLARE @PageNumber AS INT, @RowspPage AS INT
DECLARE @TotalRows AS INT
declare @dt varchar(19)
SET @PageNumber = 0
SET @RowspPage = 1000000
select @TotalRows = count (*) from LargeSourceTable_TEMP
While ((@PageNumber - 1) * @RowspPage < @TotalRows )
Begin
begin transaction tran_inner
; with cte as
(
SELECT * FROM LargeSourceTable_TEMP ORDER BY ID
OFFSET ((@PageNumber) * @RowspPage) ROWS
FETCH NEXT @RowspPage ROWS ONLY
)
INSERT INTO LargeSourceTable
(
ID
,[CreateDate]
,[Column1]
,[Column2]
,[Column3]
)
select
ID
,[CreateDate]
,[Column1]
,[Column2]
,[Column3]
from cte
commit transaction tran_inner
PRINT 'Page: ' + convert(varchar(10), @PageNumber)
PRINT 'Transfered: ' + convert(varchar(20), @PageNumber * @RowspPage)
PRINT 'Of: ' + convert(varchar(20), @TotalRows)
SELECT @dt = convert(varchar(19), getdate(), 121)
RAISERROR('Inserted on: %s', 0, 1, @dt) WITH NOWAIT
SET @PageNumber = @PageNumber + 1
End
SET IDENTITY_INSERT LargeSourceTable OFF
SET IDENTITY_INSERT LargeSourceTable OFF
回答by Bashirpour
There are two solutions in mysql
:
有两种解决方案mysql
:
A)Delete duplicate rows using DELETE JOIN
statement
A)使用DELETE JOIN
语句删除重复行
DELETE t1 FROM contacts t1
INNER JOIN contacts t2
WHERE
t1.id < t2.id AND
t1.email = t2.email;
This query references the contacts table twice, therefore, it uses the table alias t1
and t2
.
此查询两次引用联系人表,因此,它使用表别名t1
和t2
。
The output is:
输出是:
1 Query OK, 4 rows affected (0.10 sec)
1 次查询正常,4 行受影响(0.10 秒)
In case you want to delete duplicate rows and keep the lowest id
, you can use the following statement:
如果要删除重复行并保留lowest id
,可以使用以下语句:
DELETE c1 FROM contacts c1
INNER JOIN contacts c2
WHERE
c1.id > c2.id AND
c1.email = c2.email;
B)Delete duplicate rows using an intermediate table
B)使用中间表删除重复行
The following shows the steps for removing duplicate rows using an intermediate table:
下面显示了使用中间表删除重复行的步骤:
1. Create a new table with the structure the same as the original table that you want to delete duplicate rows.
1.新建一个与原表结构相同的表,要删除重复行。
2. Insert distinct rows from the original table to the immediate table.
2. 将原始表中的不同行插入到直接表中。
3. Insert distinct rows from the original table to the immediate table.
3. 将原始表中的不同行插入到直接表中。
Step 1. Create a new table whose structure is the same as the original table:
Step 1. 创建一个与原表结构相同的新表:
CREATE TABLE source_copy LIKE source;
Step 2. Insert distinct rows from the original table to the new table:
步骤 2. 将原始表中的不同行插入到新表中:
INSERT INTO source_copy
SELECT * FROM source
GROUP BY col; -- column that has duplicate values
Step 3. drop the original table and rename the immediate table to the original one
步骤 3. 删除原始表并将直接表重命名为原始表
DROP TABLE source;
ALTER TABLE source_copy RENAME TO source;
Source: http://www.mysqltutorial.org/mysql-delete-duplicate-rows/
来源:http: //www.mysqltutorial.org/mysql-delete-duplicate-rows/