Postgresql UNION 花费的时间是运行单个查询的 10 倍
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6337678/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Postgresql UNION takes 10 times as long as running the individual queries
提问by lanrat
I am trying to get the diff between two nearly identical tables in postgresql. The current query I am running is:
我试图在 postgresql 中获取两个几乎相同的表之间的差异。我正在运行的当前查询是:
SELECT * FROM tableA EXCEPT SELECT * FROM tableB;
and
和
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
Each of the above queries takes about 2 minutes to run (Its a large table)
上述每个查询大约需要 2 分钟才能运行(它是一个大表)
I wanted to combine the two queries in hopes to save time, so I tried:
我想结合这两个查询以节省时间,所以我尝试:
SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
And while it works, it takes 20 minutes to run!!! I would guess that it would at most take 4 minutes, the amount of time to run each query individually.
虽然它有效,但需要 20 分钟才能运行!!!我猜它最多需要 4 分钟,即单独运行每个查询的时间。
Is there some extra work UNION is doing that is making it take so long? Or is there any way I can speed this up (with or without the UNION)?
UNION 是否有一些额外的工作导致它需要这么长时间?或者有什么方法可以加快速度(有或没有 UNION)?
UPDATE: Running the query with UNION ALL takes 15 minutes, almost 4 times as long as running each one on its own, Am I correct in saying that UNION (all) is not going to speed this up at all?
更新:使用 UNION ALL 运行查询需要 15 分钟,几乎是单独运行每个查询的 4 倍,我说 UNION (all) 根本不会加快速度是否正确?
回答by RThomas
With regards to your "extra work" question. Yes. Union not only combines the two queries but also goes through and removes duplicates. It's the same as using a distinct statement.
关于您的“额外工作”问题。是的。Union 不仅合并了两个查询,而且还遍历并删除了重复项。这与使用不同的语句相同。
For this reason, especially combined with your except statements "union all" would likely be faster.
出于这个原因,特别是结合您的 except 语句“union all”可能会更快。
Read more here: http://www.postgresql.org/files/documentation/books/aw_pgsql/node80.html
在此处阅读更多信息:http: //www.postgresql.org/files/documentation/books/aw_pgsql/node80.html
回答by dave
In addition to combining the results of the first and second query, UNION
by default also removes duplicate records. (see http://www.postgresql.org/docs/8.1/static/sql-select.html). The extra work involved in checking for duplicate records between the two queries is probably responsible for the extra time. In this situation there should not be any duplicate records so the extra work looking for duplicates can be avoided by specifying UNION ALL
.
除了合并第一次和第二次查询的结果外,UNION
默认情况下还会删除重复记录。(参见http://www.postgresql.org/docs/8.1/static/sql-select.html)。检查两个查询之间的重复记录所涉及的额外工作可能是造成额外时间的原因。在这种情况下,不应有任何重复记录,因此可以通过指定UNION ALL
.
SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION ALL
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
回答by onedaywhen
I don't think your code returns resultset you intend it to. I rather think you want to do this:
我认为您的代码不会返回您想要的结果集。我宁愿认为你想这样做:
SELECT *
FROM (
SELECT * FROM tableA
EXCEPT
SELECT * FROM tableB
) AS T1
UNION
SELECT *
FROM (
SELECT * FROM tableB
EXCEPT
SELECT * FROM tableA
) AS T2;
In other words, you want the set of mutually exclusive members. If so, you need to read up on relational operator precedence in SQL ;) And when you have, you may realise the above can be rationalised to:
换句话说,您需要一组互斥成员。如果是这样,您需要阅读 SQL 中的关系运算符优先级 ;) 并且当您阅读时,您可能会意识到上述内容可以合理化为:
SELECT * FROM tableA
UNION
SELECT * FROM tableB
EXCEPT
SELECT * FROM tableA
INTERSECT
SELECT * FROM tableB;
FWIW, using subqueries (derived tables T1
and T2
) to explicitly show (what would otherwise be implicit) relational operator precedence, your original query is this:
FWIW,使用子查询(派生表T1
和T2
)显式显示(否则将是隐式的)关系运算符优先级,您的原始查询是这样的:
SELECT *
FROM (
SELECT *
FROM (
SELECT *
FROM tableA
EXCEPT
SELECT *
FROM tableB
) AS T2
UNION
SELECT *
FROM tableB
) AS T1
EXCEPT
SELECT *
FROM tableA;
The above can be relationalised to:
以上可以关系到:
SELECT *
FROM tableB
EXCEPT
SELECT *
FROM tableA;
...and I think not what is intended.
......我认为这不是预期的。
回答by peufeu
You could use tableA FULL OUTER JOIN tableB, which would give what you want (with a propre join condition) with only 1 table scan, it probably would be faster than the 2 queries above.
您可以使用 tableA FULL OUTER JOIN tableB,它只需要 1 次表扫描就可以提供您想要的(具有适当的连接条件),它可能比上面的 2 个查询更快。
Post more info please.
请发布更多信息。