SQL 带有 ORDER 和 LIMIT 子句的极慢的 PostgreSQL 查询
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6037843/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extremely slow PostgreSQL query with ORDER and LIMIT clauses
提问by jakeboxer
I have a table, let's call it "foos", with almost 6 million records in it. I am running the following query:
我有一张桌子,我们称之为“foos”,里面有将近 600 万条记录。我正在运行以下查询:
SELECT "foos".*
FROM "foos"
INNER JOIN "bars" ON "foos".bar_id = "bars".id
WHERE (("bars".baz_id = 13266))
ORDER BY "foos"."id" DESC
LIMIT 5 OFFSET 0;
This query takes a very long time to run (Rails times out while running it). There is an index on all IDs in question. The curious part is, if I remove either the ORDER BY
clause or the LIMIT
clause, it runs almost instantaneously.
此查询需要很长时间才能运行(Rails 在运行时超时)。所有有问题的 ID 都有一个索引。奇怪的是,如果我删除ORDER BY
子句或LIMIT
子句,它几乎立即运行。
I'm assuming that the presence of both ORDER BY
and LIMIT
are making PostgreSQL make some bad choices in query planning. Anyone have any ideas on how to fix this?
我假定这两者的存在ORDER BY
和LIMIT
正在做的PostgreSQL查询规划一些错误的选择。任何人都对如何解决这个问题有任何想法?
In case it helps, here is the EXPLAIN
for all 3 cases:
如果有帮助,这里是EXPLAIN
所有 3 种情况:
//////// Both ORDER and LIMIT
SELECT "foos".*
FROM "foos"
INNER JOIN "bars" ON "foos".bar_id = "bars".id
WHERE (("bars".baz_id = 13266))
ORDER BY "foos"."id" DESC
LIMIT 5 OFFSET 0;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..16663.44 rows=5 width=663)
-> Nested Loop (cost=0.00..25355084.05 rows=7608 width=663)
Join Filter: (foos.bar_id = bars.id)
-> Index Scan Backward using foos_pkey on foos (cost=0.00..11804133.33 rows=4963477 width=663)
Filter: (((NOT privacy_protected) OR (user_id = 67962)) AND ((status)::text = 'DONE'::text))
-> Materialize (cost=0.00..658.96 rows=182 width=4)
-> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4)
Index Cond: (baz_id = 13266)
(8 rows)
//////// Just LIMIT
SELECT "foos".*
FROM "foos"
INNER JOIN "bars" ON "foos".bar_id = "bars".id
WHERE (("bars".baz_id = 13266))
LIMIT 5 OFFSET 0;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..22.21 rows=5 width=663)
-> Nested Loop (cost=0.00..33788.21 rows=7608 width=663)
-> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4)
Index Cond: (baz_id = 13266)
-> Index Scan using index_foos_on_bar_id on foos (cost=0.00..181.51 rows=42 width=663)
Index Cond: (foos.bar_id = bars.id)
Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text))
(7 rows)
//////// Just ORDER
SELECT "foos".*
FROM "foos"
INNER JOIN "bars" ON "foos".bar_id = "bars".id
WHERE (("bars".baz_id = 13266))
ORDER BY "foos"."id" DESC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=36515.17..36534.19 rows=7608 width=663)
Sort Key: foos.id
-> Nested Loop (cost=0.00..33788.21 rows=7608 width=663)
-> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4)
Index Cond: (baz_id = 13266)
-> Index Scan using index_foos_on_bar_id on foos (cost=0.00..181.51 rows=42 width=663)
Index Cond: (foos.bar_id = bars.id)
Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text))
(8 rows)
回答by Andrew Lazarus
When you have both the LIMIT and ORDER BY, the optimizer has decided it is faster to limp through the unfiltered records on foo by key descending until it gets five matches for the rest of the criteria. In the other cases, it simply runs the query as a nested loop and returns all the records.
当您同时拥有 LIMIT 和 ORDER BY 时,优化器已决定通过键降序遍历 foo 上未过滤的记录会更快,直到它获得其余条件的五个匹配项。在其他情况下,它只是将查询作为嵌套循环运行并返回所有记录。
Offhand, I'd say the problem is that PG doesn't grok the jointdistribution of the various ids and that's why the plan is so sub-optimal.
顺便说一句,我想说的问题是 PG 不了解各种 id的联合分布,这就是为什么该计划如此次优的原因。
For possible solutions: I'll assume that you have run ANALYZE recently. If not, do so. That may explain why your estimated times are high even on the version that returns fast. If the problem persists, perhaps run the ORDER BY as a subselect and slap the LIMIT on in an outer query.
对于可能的解决方案:我假设您最近运行了 ANALYZE。如果没有,请这样做。这可以解释为什么即使在快速返回的版本上,您的估计时间也很高。如果问题仍然存在,也许可以将 ORDER BY 作为子选择运行,并在外部查询中设置 LIMIT。
回答by Davide Ungari
Probably it happens because before it tries to order then to select. Why do not try to sort the result in an outer select all? Something like: SELECT * FROM (SELECT ... INNER JOIN ETC...) ORDER BY ... DESC
可能发生这种情况是因为在它尝试订购然后选择之前。为什么不尝试在外部全选中对结果进行排序?类似于:SELECT * FROM (SELECT ... INNER JOIN ETC ...) ORDER BY ... DESC
回答by ic3b3rg
Your query plan indicates a filter on
您的查询计划指示过滤器
(((NOT privacy_protected) OR (user_id = 67962)) AND ((status)::text = 'DONE'::text))
which doesn't appear in the SELECT - where is it coming from?
它没有出现在 SELECT 中 - 它来自哪里?
Also, note that expression is listed as a "Filter" and not an "Index Cond" which would seem to indicate there's no index applied to it.
另请注意,表达式被列为“过滤器”而不是“索引条件”,这似乎表明没有对其应用索引。
回答by Christian Noel
it may be running a full-table scan on "foos". did you try changing the order of the tables and instead use a left-join instead of inner-join and see if it displays results faster.
它可能正在对“foos”运行全表扫描。您是否尝试更改表的顺序,而是使用左连接而不是内连接,看看它是否更快地显示结果。
say...
说...
SELECT "bars"."id", "foos".*
FROM "bars"
LEFT JOIN "foos" ON "bars"."id" = "foos"."bar_id"
WHERE "bars"."baz_id" = 13266
ORDER BY "foos"."id" DESC
LIMIT 5 OFFSET 0;