从表中删除重复记录 - SQL 查询
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1748459/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Deleting duplicate record from table - SQL query
提问by Santanu
I need to delete duplicate rows only from the table, like I have 3 duplicate rows in the table, my query will delete 2 rows from 3 duplicated rows.
我只需要从表中删除重复的行,就像我在表中有 3 个重复的行一样,我的查询将从 3 个重复的行中删除 2 行。
How can I get this? Please help me.
我怎样才能得到这个?请帮我。
回答by Muhammad Akhtar
Please try the below query, it will definitely meet your objective
请尝试以下查询,它一定会满足您的目标
SET ROWCOUNT 1
DELETE test
FROM test a
WHERE (SELECT COUNT(*) FROM test b WHERE b.name = a.name) > 1
WHILE @@rowcount > 0
DELETE test
FROM test a
WHERE (SELECT COUNT(*) FROM test b WHERE b.name = a.name) > 1
SET ROWCOUNT 0
where testis your table name
其中test是您的表名
回答by Mark Brittingham
This works in SQL Server although it isn't a single statement:
这在 SQL Server 中有效,尽管它不是单个语句:
Declare @cnt int;
Select @cnt=COUNT(*) From DupTable Where (Col1=1); -- Assumes you are trying to delete the duplicates where some condition (e.g. Col1=1) is true.
Delete Top (@cnt-1) From DupTable
It also doesn't require any extraassumptions (like the existance of another column that makes each row unique). After all, Santanu did say that the rowswere duplicates and not just the one column.
它还不需要任何额外的假设(例如存在另一列使每一行都是唯一的)。毕竟,Santanu 确实说过这些行是重复的,而不仅仅是一列。
However, the rightanswer, in my view, is to get a real table structure. That is, add an IDENTITY column to this table so that you can use a single SQL command to do your work. Like this:
然而,在我看来,正确的答案是获得一个真正的表结构。也就是说,向该表添加一个 IDENTITY 列,以便您可以使用单个 SQL 命令来完成您的工作。像这样:
ALTER TABLE dbo.DupTable ADD
IDCol int NOT NULL IDENTITY (1, 1)
GO
Then the delete is trivial:
然后删除是微不足道的:
DELETE FROM DupTable WHERE IDCol NOT IN
(SELECT MAX(IDCol) FROM DupTable GROUP BY Col1, Col2, Col3)
回答by jensgram
DELETE FROM Table t1, Table t2 WHERE t1.colDup = t2.colDup AND t1.date < t2.date
Will delete every duplicate row from Table
(on column colDup
) except the oldest (i.e. lowset date
).
将从Table
(列colDup
)中删除所有重复行,除了最旧的(即 lowset date
)。
回答by Dor
DELETE FROM `mytbl`
INNER JOIN (
SELECT 1 FROM `mytbl`
GROUP BY `duplicated_column` HAVING COUNT(*)=2
) USING(`id`)
Edit:
编辑:
My bad, the above query won't work.
我的错,上面的查询不起作用。
Assuming table structure:
假设表结构:
id
int auto_increment
id
int auto_increment
num
int # <-- this is the column with duplicated values
num
int # <-- 这是具有重复值的列
The following query would work in MySQL (i checked):
以下查询将在 MySQL 中工作(我检查过):
DELETE `mytbl` FROM `mytbl`
INNER JOIN
(
SELECT `num` FROM `mytbl`
GROUP BY `num` HAVING COUNT(*)=2
) AS `tmp` USING (`num`)
The query would delete the rows that have 2 (not more or else) duplicated values in the num
column.
该查询将删除列中具有 2 个(不是更多或其他)重复值的num
行。
Edit (again):
编辑(再次):
I suggest to add a key on the num
column.
我建议在num
列上添加一个键。
Edit(#3):
编辑(#3):
In case that the author wanted to delete the duplicated rows, the following should work for MySQL (it worked for me):
如果作者想删除重复的行,以下内容应该适用于 MySQL(它对我有用):
DELETE `delete_duplicated_rows` FROM `delete_duplicated_rows`
NATURAL JOIN (
SELECT *
FROM `delete_duplicated_rows`
GROUP BY `num1` HAVING COUNT(*)=2
) AS `der`
While assuming table structure is:
假设表结构是:
CREATE TABLE `delete_duplicated_rows` (
`num1` tinyint(4) DEFAULT NOT NULL,
`num2` tinyint(4) DEFAULT NOT NULL
) ENGINE=MyISAM;
回答by Danil
I think each table has unique identifier. So if it exists then you can write following query: Delete Table1 from Table1 t1 where 2 >= (select count(id) from Table1 where dupColumn = t1.dupColumn) and t1.id not in (select max (id) from Table1 where dupColumn = t1.dupColumn)
我认为每个表都有唯一的标识符。因此,如果它存在,那么您可以编写以下查询: Delete Table1 from Table1 t1 where 2 >= (select count(id) from Table1 where dupColumn = t1.dupColumn) and t1.id not in (select max (id) from Table1 where dupColumn = t1.dupColumn)
OOps. It seems it is possible to use second filter only Delete Table1 from Table1 t1 where t1.id not in (select max (id) from Table1 where dupColumn = t1.dupColumn)
哎呀。似乎可以只使用第二个过滤器 Delete Table1 from Table1 t1 where t1.id not in (select max (id) from Table1 where dupColumn = t1.dupColumn)
回答by Danil
If you have the id's of the rows you want to delete then...
如果您有要删除的行的 ID,则...
DELETE FROM table WHERE id IN (1, 4, 7, [id numbers to delete...])
回答by Yordan Georgiev
-- Just to demonstrates Marks example
.
-- START === 1.0.dbo..DuplicatesTable.TableCreate.sql
/****** Object: Table [dbo].[DuplicatesTable]
Script Date: 03/29/2010 21:24:02 ******/
IF EXISTS (SELECT * FROM sys.objects
WHERE
object_id = OBJECT_ID(N'[dbo].[DuplicatesTable]')
AND type in (N'U'))
DROP TABLE [dbo].[DuplicatesTable]
GO
/****** Object: Table [dbo].[DuplicatesTable]
Script Date: 03/29/2010 21:24:02 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[DuplicatesTable](
[ColA] [varchar](10) NOT NULL, -- the name of the DuplicatesTable
[ColB] [varchar](10) NULL, -- the description of the e DuplicatesTable
)
/*
<doc>
Models a DuplicatesTable for
</doc>
*/
GO
--============================================================ DuplicatesTable START
declare @ScriptFileName varchar(2000)
SELECT @ScriptFileName = '$(ScriptFileName)'
SELECT @ScriptFileName + ' --- DuplicatesTable START ========================================='
declare @TableName varchar(200)
select @TableName = 'DuplicatesTable'
SELECT 'SELECT name from sys.tables where name =''' + @TableName + ''''
SELECT name from sys.tables
where name = @TableName
DECLARE @TableCount INT
SELECT @TableCount = COUNT(name ) from sys.tables
where name =@TableName
if @TableCount=1
SELECT ' DuplicatesTable PASSED. The Table ' + @TableName + ' EXISTS '
ELSE
SELECT ' DuplicatesTable FAILED. The Table ' + @TableName + ' DOES NOT EXIST '
SELECT @ScriptFileName + ' --- DuplicatesTable END ========================================='
--============================================================ DuplicatesTable END
GO
-- END === 1.0.dbo..DuplicatesTable.TableCreate.sql
.
-- START === 1.1..dbo..DuplicatesTable.TableInsert.sql
BEGIN TRANSACTION;
INSERT INTO [dbo].[DuplicatesTable]([ColA], [ColB])
SELECT N'ColA', N'ColB' UNION ALL
SELECT N'ColA', N'ColB' UNION ALL
SELECT N'ColA', N'ColB' UNION ALL
SELECT N'ColA', N'ColB' UNION ALL
SELECT N'ColA', N'ColB' UNION ALL
SELECT N'ColA', N'ColB' UNION ALL
SELECT N'ColA', N'ColB' UNION ALL
SELECT N'ColA1', N'ColB1' UNION ALL
SELECT N'ColA1', N'ColB1' UNION ALL
SELECT N'ColA1', N'ColB1' UNION ALL
SELECT N'ColA1', N'ColB1' UNION ALL
SELECT N'ColA1', N'ColB1' UNION ALL
SELECT N'ColA1', N'ColB1' UNION ALL
SELECT N'ColA1', N'ColB1'
COMMIT;
RAISERROR (N'[dbo].[DuplicatesTable]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;
GO
-- END === 1.1..dbo..DuplicatesTable.TableInsert.sql
.
-- START === 2.0.RemoveDuplicates.Script.sql
ALTER TABLE dbo.DuplicatesTable ADD
DuplicatesTableId int NOT NULL IDENTITY (1, 1)
GO
-- Then the delete is trivial:
DELETE FROM dbo.DuplicatesTable WHERE DuplicatesTableId NOT IN
(SELECT MAX(DuplicatesTableId) FROM dbo.DuplicatesTable GROUP BY ColA , ColB)
Select * from DuplicatesTable ;
-- END === 2.0.RemoveDuplicates.Script.sql