SQL 在PostgreSQL中查找两个大表之间的差异

Question

提问by odiszapc

I have two similar tables in Postgres with just one 32-byte latin field (simple md5 hash). Both tables have ~30,000,000 rows. Tables have little difference (10-1000 rows are different)

我在 Postgres 中有两个类似的表，只有一个 32 字节的拉丁字段（简单的 md5 哈希）。两个表都有大约 30,000,000 行。表格差别不大（10-1000行不一样）

Is it possible with Postgres to find a difference between these tables, the result should be 10-1000 rows I described above.

Postgres 是否有可能找到这些表之间的差异，结果应该是我上面描述的 10-1000 行。

This is not a real task, I just want to know about how PostgreSQL deals with JOIN-like logic.

这不是一个真正的任务，我只是想知道 PostgreSQL 如何处理类似 JOIN 的逻辑。

Answer 1

回答by Erwin Brandstetter

The best option is probably an EXISTSanti-semi-join:

最好的选择可能是EXISTS反半连接：

tbl1is the table with surplus rows in this example:

tbl1是本例中具有剩余行的表：

SELECT *
FROM   tbl1
WHERE  NOT EXISTS (SELECT 1 FROM tbl2 WHERE tbl2.col = tbl1.col);

If you don't know which table has surplus rows or both have, you can either repeat the above query after switching table names, or:

如果您不知道哪个表有多余的行或两者都有，您可以在切换表名后重复上述查询，或者：

SELECT *
FROM   tbl1
FULL   OUTER JOIN tbl2 USING (col)
WHERE  tbl2 col IS NULL OR
       tbl1.col IS NULL;

Overview over basic techniques in a later post:

在后面的文章中对基本技术的概述：

Select rows which are not present in other table

选择其他表中不存在的行

BTW, it would be muchmore efficient to use uuidcolumns for md5 hashes:

顺便说一句，这将是很多更有效地使用uuid列MD5哈希值：

Answer 2

回答by ThomasH

To augment existing answers I use the row()function for the join condition. This allows you to compare entire rows. E.g. my typical query to see the symmetric difference looks like this

为了增加现有的答案，我使用row()了连接条件的函数。这允许您比较整行。例如，我查看对称差异的典型查询如下所示

select *
from tbl1
full outer join tbl2 
    on row(tbl1) = row(tbl2)
where tbl1.col is null
or    tbl2.col is null

Answer 3

回答by 0xCAFEBABE

In my experience, NOT IN with a subquery takes a very long time. I'd do it with an inclusive join:

根据我的经验，NOT IN 使用子查询需要很长时间。我会通过包容性加入来做到这一点：

DELETE FROM table1 where ID IN (
SELECT id FROM table1
LEFT OUTER JOIN table2 on table1.hashfield = table2.hashfield
WHERE table2.hashfield IS NULL)

And then do the same the other way around for the other table.

然后对另一张桌子以相反的方式做同样的事情。

SQL 在PostgreSQL中查找两个大表之间的差异

提问by odiszapc

回答by Erwin Brandstetter

回答by ThomasH

回答by 0xCAFEBABE

相关推荐

最近更新

标签

SQL 在PostgreSQL中查找两个大表之间的差异

提问by odiszapc

回答by Erwin Brandstetter

回答by ThomasH

回答by 0xCAFEBABE

相关推荐

SQL varchar 值的转换溢出了一个 int 列

SQL 改进 Firebird 数据库上的“% LIKE %”查询

SQL 将多行合并为多列

SQL 在 ssrs 表达式中使用“like”

相关推荐

最近更新

标签