Postgresql UNION 花费的时间是运行单个查询的 10 倍

Question

提问by lanrat

I am trying to get the diff between two nearly identical tables in postgresql. The current query I am running is:

我试图在 postgresql 中获取两个几乎相同的表之间的差异。我正在运行的当前查询是：

SELECT * FROM tableA EXCEPT SELECT * FROM tableB;

and

和

SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

Each of the above queries takes about 2 minutes to run (Its a large table)

上述每个查询大约需要 2 分钟才能运行（它是一个大表）

I wanted to combine the two queries in hopes to save time, so I tried:

我想结合这两个查询以节省时间，所以我尝试：

SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

And while it works, it takes 20 minutes to run!!! I would guess that it would at most take 4 minutes, the amount of time to run each query individually.

虽然它有效，但需要 20 分钟才能运行！！！我猜它最多需要 4 分钟，即单独运行每个查询的时间。

Is there some extra work UNION is doing that is making it take so long? Or is there any way I can speed this up (with or without the UNION)?

UNION 是否有一些额外的工作导致它需要这么长时间？或者有什么方法可以加快速度（有或没有 UNION）？

UPDATE: Running the query with UNION ALL takes 15 minutes, almost 4 times as long as running each one on its own, Am I correct in saying that UNION (all) is not going to speed this up at all?

更新：使用 UNION ALL 运行查询需要 15 分钟，几乎是单独运行每个查询的 4 倍，我说 UNION (all) 根本不会加快速度是否正确？

Answer 1

回答by RThomas

With regards to your "extra work" question. Yes. Union not only combines the two queries but also goes through and removes duplicates. It's the same as using a distinct statement.

关于您的“额外工作”问题。是的。Union 不仅合并了两个查询，而且还遍历并删除了重复项。这与使用不同的语句相同。

For this reason, especially combined with your except statements "union all" would likely be faster.

出于这个原因，特别是结合您的 except 语句“union all”可能会更快。

Read more here: http://www.postgresql.org/files/documentation/books/aw_pgsql/node80.html

在此处阅读更多信息：http: //www.postgresql.org/files/documentation/books/aw_pgsql/node80.html

Answer 2

回答by dave

In addition to combining the results of the first and second query, UNIONby default also removes duplicate records. (see http://www.postgresql.org/docs/8.1/static/sql-select.html). The extra work involved in checking for duplicate records between the two queries is probably responsible for the extra time. In this situation there should not be any duplicate records so the extra work looking for duplicates can be avoided by specifying UNION ALL.

除了合并第一次和第二次查询的结果外，UNION默认情况下还会删除重复记录。（参见http://www.postgresql.org/docs/8.1/static/sql-select.html）。检查两个查询之间的重复记录所涉及的额外工作可能是造成额外时间的原因。在这种情况下，不应有任何重复记录，因此可以通过指定UNION ALL.

SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION ALL
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;

Answer 3

回答by onedaywhen

I don't think your code returns resultset you intend it to. I rather think you want to do this:

我认为您的代码不会返回您想要的结果集。我宁愿认为你想这样做：

SELECT * 
  FROM (
        SELECT * FROM tableA 
        EXCEPT 
        SELECT * FROM tableB
       ) AS T1
UNION 
SELECT * 
  FROM (
        SELECT * FROM tableB 
        EXCEPT 
        SELECT * FROM tableA
       ) AS T2;

In other words, you want the set of mutually exclusive members. If so, you need to read up on relational operator precedence in SQL ;) And when you have, you may realise the above can be rationalised to:

换句话说，您需要一组互斥成员。如果是这样，您需要阅读 SQL 中的关系运算符优先级 ;) 并且当您阅读时，您可能会意识到上述内容可以合理化为：

SELECT * FROM tableA 
UNION 
SELECT * FROM tableB
EXCEPT 
SELECT * FROM tableA 
INTERSECT
SELECT * FROM tableB;

FWIW, using subqueries (derived tables T1and T2) to explicitly show (what would otherwise be implicit) relational operator precedence, your original query is this:

FWIW，使用子查询（派生表T1和T2）显式显示（否则将是隐式的）关系运算符优先级，您的原始查询是这样的：

SELECT * 
  FROM (
        SELECT * 
          FROM (
                SELECT * 
                  FROM tableA 
                EXCEPT 
                SELECT * 
                  FROM tableB
               ) AS T2
        UNION
        SELECT * 
          FROM tableB
       ) AS T1
EXCEPT 
SELECT * 
  FROM tableA;

The above can be relationalised to:

以上可以关系到：

SELECT * 
  FROM tableB 
EXCEPT 
SELECT * 
  FROM tableA;

...and I think not what is intended.

......我认为这不是预期的。

Answer 4

回答by peufeu

You could use tableA FULL OUTER JOIN tableB, which would give what you want (with a propre join condition) with only 1 table scan, it probably would be faster than the 2 queries above.

您可以使用 tableA FULL OUTER JOIN tableB，它只需要 1 次表扫描就可以提供您想要的（具有适当的连接条件），它可能比上面的 2 个查询更快。

Post more info please.

请发布更多信息。

Postgresql UNION 花费的时间是运行单个查询的 10 倍

提问by lanrat

回答by RThomas

回答by dave

回答by onedaywhen

回答by peufeu

相关推荐

最近更新

标签

Postgresql UNION 花费的时间是运行单个查询的 10 倍

提问by lanrat

回答by RThomas

回答by dave

回答by onedaywhen

回答by peufeu

相关推荐

postgresql 如何将参数传递给 sql 'in' 语句？

如果两个表在不同的模式中，我们可以在 postgresql 中加入两个表吗

将记录从 PostgreSQL 导入 MySQL

postgresql 如何从时间戳字段< now()-44 分钟的表中删除记录？

相关推荐

最近更新

标签