SQL 从sqlite数据库中删除重复的行

Question

提问by Patches

I have a huge table - 36 million rows - in SQLite3. In this very large table, there are two columns:

我在 SQLite3 中有一个巨大的表 - 3600 万行。在这个非常大的表中，有两列：

hash- text
d- real

hash- 文本
d- 真实的

Some of the rows are duplicates. That is, both hashand dhave the same values. If two hashes are identical, then so are the values of d. However, two identical d's does not imply two identical hash'es.

一些行是重复的。也就是说，两者hash和d具有相同的值。如果两个散列相同，则的值也相同d。然而，两个相同的d's 并不意味着两个相同的hash'es。

I want to delete the duplicate rows. I don't have a primary key column.

我想删除重复的行。我没有主键列。

What's the fastest way to do this?

这样做的最快方法是什么？

Answer 1

回答by Andomar

You need a way to distinguish the rows. Based on your comment, you could use the special rowid columnfor that.

您需要一种方法来区分行。根据您的评论，您可以使用特殊的rowid 列。

To delete duplicates by keeping the lowest rowidper (hash,d):

要通过保持最低的rowidper来删除重复项(hash,d)：

delete   from YourTable
where    rowid not in
         (
         select  min(rowid)
         from    YourTable
         group by
                 hash
         ,       d
         )

Answer 2

回答by MaDa

I guess the fastest would be to use the very database for it: add a new table with the same columns, but with proper constraints (a unique index on hash/real pair?), iterate through the original table and try to insert records in the new table, ignoring constraint violation errors (i.e. continue iterating when exceptions are raised).

我想最快的方法是使用数据库：添加一个具有相同列的新表，但具有适当的约束（散列/实数对上的唯一索引？），遍历原始表并尝试在其中插入记录新表，忽略约束违规错误（即在引发异常时继续迭代）。

Then delete the old table and rename the new to the old one.

然后删除旧表并将新表重命名为旧表。

Answer 3

回答by rsbarro

If adding a primary key is not an option, then one approach would be to store the duplicates DISTINCT in a temp table, delete all of the duplicated records from the existing table, and then add the records back into the original table from the temp table.

如果添加主键不是一种选择，那么一种方法是将重复项 DISTINCT 存储在临时表中，从现有表中删除所有重复记录，然后将记录从临时表中添加回原始表.

For example (written for SQL Server 2008, but the technique is the same for any database):

例如（为 SQL Server 2008 编写，但该技术适用于任何数据库）：

DECLARE @original AS TABLE([hash] varchar(20), [d] float)
INSERT INTO @original VALUES('A', 1)
INSERT INTO @original VALUES('A', 2)
INSERT INTO @original VALUES('A', 1)
INSERT INTO @original VALUES('B', 1)
INSERT INTO @original VALUES('C', 1)
INSERT INTO @original VALUES('C', 1)

DECLARE @temp AS TABLE([hash] varchar(20), [d] float)
INSERT INTO @temp
SELECT [hash], [d] FROM @original 
GROUP BY [hash], [d]
HAVING COUNT(*) > 1

DELETE O
FROM @original O
JOIN @temp T ON T.[hash] = O.[hash] AND T.[d] = O.[d]

INSERT INTO @original
SELECT [hash], [d] FROM @temp

SELECT * FROM @original

I'm not sure if sqlite has a ROW_NUMBER()type function, but if it does you could also try some of the approaches listed here: Delete duplicate records from a SQL table without a primary key

我不确定 sqlite 是否有ROW_NUMBER()类型函数，但如果有，您也可以尝试这里列出的一些方法：从没有主键的 SQL 表中删除重复记录

SQL 从sqlite数据库中删除重复的行

提问by Patches

回答by Andomar

回答by MaDa

回答by rsbarro

相关推荐

最近更新

标签

SQL 从sqlite数据库中删除重复的行

提问by Patches

回答by Andomar

回答by MaDa

回答by rsbarro

相关推荐

SQL 来自左连接的 Postgres 更新

将现有 SQL Server 2005 数据库中的数据类型 varchar 更改为 nvarchar。有什么问题吗？

如何使用 MS SQL Server Management Studio 删除 SQL 数据库表中的前导空白空间

将两列的数据合并为 SQL 中的一列？

相关推荐

最近更新

标签