SQL 从sqlite数据库中删除重复的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8190541/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 13:19:50  来源:igfitidea点击:

Deleting duplicate rows from sqlite database

sqldatabasesqlite

提问by Patches

I have a huge table - 36 million rows - in SQLite3. In this very large table, there are two columns:

我在 SQLite3 中有一个巨大的表 - 3600 万行。在这个非常大的表中,有两列:

  • hash- text
  • d- real
  • hash- 文本
  • d- 真实的

Some of the rows are duplicates. That is, both hashand dhave the same values. If two hashes are identical, then so are the values of d. However, two identical d's does not imply two identical hash'es.

一些行是重复的。也就是说,两者hashd具有相同的值。如果两个散列相同,则 的值也相同d。然而,两个相同的d's 并不意味着两个相同的hash'es。

I want to delete the duplicate rows. I don't have a primary key column.

我想删除重复的行。我没有主键列。

What's the fastest way to do this?

这样做的最快方法是什么?

回答by Andomar

You need a way to distinguish the rows. Based on your comment, you could use the special rowid columnfor that.

您需要一种方法来区分行。根据您的评论,您可以使用特殊的rowid 列

To delete duplicates by keeping the lowest rowidper (hash,d):

要通过保持最低的rowidper来删除重复项(hash,d)

delete   from YourTable
where    rowid not in
         (
         select  min(rowid)
         from    YourTable
         group by
                 hash
         ,       d
         )

回答by MaDa

I guess the fastest would be to use the very database for it: add a new table with the same columns, but with proper constraints (a unique index on hash/real pair?), iterate through the original table and try to insert records in the new table, ignoring constraint violation errors (i.e. continue iterating when exceptions are raised).

我想最快的方法是使用数据库:添加一个具有相同列的新表,但具有适当的约束(散列/实数对上的唯一索引?),遍历原始表并尝试在其中插入记录新表,忽略约束违规错误(即在引发异常时继续迭代)。

Then delete the old table and rename the new to the old one.

然后删除旧表并将新表重命名为旧表。

回答by rsbarro

If adding a primary key is not an option, then one approach would be to store the duplicates DISTINCT in a temp table, delete all of the duplicated records from the existing table, and then add the records back into the original table from the temp table.

如果添加主键不是一种选择,那么一种方法是将重复项 DISTINCT 存储在临时表中,从现有表中删除所有重复记录,然后将记录从临时表中添加回原始表.

For example (written for SQL Server 2008, but the technique is the same for any database):

例如(为 SQL Server 2008 编写,但该技术适用于任何数据库):

DECLARE @original AS TABLE([hash] varchar(20), [d] float)
INSERT INTO @original VALUES('A', 1)
INSERT INTO @original VALUES('A', 2)
INSERT INTO @original VALUES('A', 1)
INSERT INTO @original VALUES('B', 1)
INSERT INTO @original VALUES('C', 1)
INSERT INTO @original VALUES('C', 1)

DECLARE @temp AS TABLE([hash] varchar(20), [d] float)
INSERT INTO @temp
SELECT [hash], [d] FROM @original 
GROUP BY [hash], [d]
HAVING COUNT(*) > 1

DELETE O
FROM @original O
JOIN @temp T ON T.[hash] = O.[hash] AND T.[d] = O.[d]

INSERT INTO @original
SELECT [hash], [d] FROM @temp

SELECT * FROM @original

I'm not sure if sqlite has a ROW_NUMBER()type function, but if it does you could also try some of the approaches listed here: Delete duplicate records from a SQL table without a primary key

我不确定 sqlite 是否有ROW_NUMBER()类型函数,但如果有,您也可以尝试这里列出的一些方法:从没有主键的 SQL 表中删除重复记录