PostgreSQL 查询速度非常慢,限制为 1
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21385555/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PostgreSQL query very slow with limit 1
提问by pat
My queries get very slow when I add a limit 1
.
添加limit 1
.
I have a table object_values
with timestamped values for objects:
我有一个object_values
带有对象时间戳值的表:
timestamp | objectID | value
--------------------------------
2014-01-27| 234 | ksghdf
Per object I want to get the latest value:
我想获取每个对象的最新值:
SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC LIMIT 1;
(I cancelled the query after more than 10 minutes)
(我在10多分钟后取消了查询)
This query is very slow when there are no values for a given objectID (it is fast if there are results). If I remove the limit it tells me nearly instantaneous that there are no results:
当给定 objectID 没有值时,此查询非常慢(如果有结果则很快)。如果我删除限制,它几乎立即告诉我没有结果:
SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC;
...
Time: 0.463 ms
An explain shows me that the query without limit uses the index, where as the query with limit 1
does not make use of the index:
一个解释告诉我,无限制的查询使用索引,而查询limit 1
没有使用索引:
Slow query:
慢查询:
explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 1;
QUERY PLAN`
----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2350.44 rows=1 width=126)
-> Index Scan Backward using object_values_timestamp on object_values (cost=0.00..3995743.59 rows=1700 width=126)
Filter: (objectID = 53708)`
Fast query:
快速查询:
explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Sort (cost=6540.86..6545.11 rows=1700 width=126)
Sort Key: timestamp
-> Index Scan using object_values_objectID on working_hours_t (cost=0.00..6449.65 rows=1700 width=126)
Index Cond: (objectID = 53708)
The table contains 44,884,559 rows and 66,762 distinct objectIDs.
I have separate indexes on both fields: timestamp
and objectID
.
I have done a vacuum analyze
on the table and I have reindexed the table.
该表包含 44,884,559 行和 66,762 个不同的 objectID。
我在两个字段上都有单独的索引:timestamp
和objectID
。
我vacuum analyze
在桌子上做了一个,我重新索引了桌子。
Additionally the slow query becomes fast when I set the limit to 3 or higher:
此外,当我将限制设置为 3 或更高时,慢查询变得很快:
explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 3;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Limit (cost=6471.62..6471.63 rows=3 width=126)
-> Sort (cost=6471.62..6475.87 rows=1700 width=126)
Sort Key: timestamp
-> Index Scan using object_values_objectID on object_values (cost=0.00..6449.65 rows=1700 width=126)
Index Cond: (objectID = 53708)
In general I assume it has to do with the planner making wrong assumptions about the exectution costs and therefore chooses for a slower execution plan.
一般来说,我认为这与计划者对执行成本做出错误假设有关,因此选择了较慢的执行计划。
Is this the real reason? Is there a solution for this?
这是真正的原因吗?有解决方案吗?
采纳答案by Denis de Bernardy
You're running into an issue which relates, I think, to the lack of statistics on row correlations. Consider reporting it to pg-bugs for reference if this is using the latest version Postgres.
您遇到了一个问题,我认为这与缺乏行相关性统计数据有关。如果这是使用最新版本的 Postgres,请考虑将其报告给 pg-bugs 以供参考。
The interpretation I'd suggest for your plans is:
我为您的计划建议的解释是:
limit 1
makes Postgres look for a single row, and in doing so it assumes that your object_id is common enough that it'll show up reasonably quickly in an index scan.Based on the stats you gave its thinking probably is that it'll need to read ~70 rows on average to find one row that fits; it just doesn't realize that object_id and timestamp correlate to the point where it's actually going to read a large portion of the table.
limit 3
, in contrast, makes it realize that it's uncommon enough, so it seriously considers (and ends up…) top-n sorting an expected 1700 rows with theobject_id
you want, on grounds that doing so is likely cheaper.For instance, it might know that the distribution of these rows is so that they're all packed in the same area on the disk.
no
limit
clause means it'll fetch the 1700 anyways, so it goes straight for the index onobject_id
.
limit 1
使 Postgres 查找单行,并在这样做时假定您的 object_id 足够常见,它会在索引扫描中相当快地显示出来。根据您给出的统计数据,它的想法可能是平均需要读取约 70 行才能找到适合的行;它只是没有意识到 object_id 和时间戳与它实际要读取表的大部分内容的点相关。
limit 3
,相比之下,让它意识到它足够不常见,所以它认真考虑(并最终......)用object_id
你想要的前 n 排序预期的 1700 行,理由是这样做可能更便宜。例如,它可能知道这些行的分布是为了它们都打包在磁盘上的同一区域中。
no
limit
子句意味着它无论如何都会获取 1700,所以它直接获取object_id
.
Solution, btw: add an index on (object_id, timestamp)
or (object_id, timestamp desc)
.
解决方案,顺便说一句:在(object_id, timestamp)
or上添加索引(object_id, timestamp desc)
。
回答by Brendan Nee
You can avoid this issue by adding an unneeded ORDER BY
clause to the query.
您可以通过向ORDER BY
查询添加不需要的子句来避免此问题。
SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp, objectID DESC limit 1;
回答by Dan Tanner
I started having similar symptoms on an update-heavy table, and what was needed in my case was
我开始在更新繁重的表上出现类似的症状,我的情况需要的是
analyze $table_name;
In this case the statistics needed to be refreshed, which then fixed the slow query plans that were occurring.
Supporting docs: https://www.postgresql.org/docs/current/sql-analyze.html
在这种情况下,需要刷新统计信息,然后修复正在发生的缓慢查询计划。
支持文档:https: //www.postgresql.org/docs/current/sql-analyze.html