PostgreSQL 查询速度非常慢，限制为 1

Question

提问by pat

My queries get very slow when I add a limit 1.

添加limit 1.

I have a table object_valueswith timestamped values for objects:

我有一个object_values带有对象时间戳值的表：

 timestamp |  objectID |  value
--------------------------------
 2014-01-27|       234 | ksghdf

Per object I want to get the latest value:

我想获取每个对象的最新值：

SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC LIMIT 1;

(I cancelled the query after more than 10 minutes)

（我在10多分钟后取消了查询）

This query is very slow when there are no values for a given objectID (it is fast if there are results). If I remove the limit it tells me nearly instantaneous that there are no results:

当给定 objectID 没有值时，此查询非常慢（如果有结果则很快）。如果我删除限制，它几乎立即告诉我没有结果：

SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC;  
...  
Time: 0.463 ms

An explain shows me that the query without limit uses the index, where as the query with limit 1does not make use of the index:

一个解释告诉我，无限制的查询使用索引，而查询limit 1没有使用索引：

Slow query:

慢查询：

explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 1;  
QUERY PLAN`
----------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.00..2350.44 rows=1 width=126)
->  Index Scan Backward using object_values_timestamp on object_values  (cost=0.00..3995743.59 rows=1700 width=126)
     Filter: (objectID = 53708)`

Fast query:

快速查询：

explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Sort  (cost=6540.86..6545.11 rows=1700 width=126)
   Sort Key: timestamp
   ->  Index Scan using object_values_objectID on working_hours_t  (cost=0.00..6449.65 rows=1700 width=126)
         Index Cond: (objectID = 53708)

The table contains 44,884,559 rows and 66,762 distinct objectIDs.
I have separate indexes on both fields: timestampand objectID.
I have done a vacuum analyzeon the table and I have reindexed the table.

该表包含 44,884,559 行和 66,762 个不同的 objectID。
我在两个字段上都有单独的索引：timestamp和objectID。
我vacuum analyze在桌子上做了一个，我重新索引了桌子。

Additionally the slow query becomes fast when I set the limit to 3 or higher:

此外，当我将限制设置为 3 或更高时，慢查询变得很快：

explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 3;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=6471.62..6471.63 rows=3 width=126)
   ->  Sort  (cost=6471.62..6475.87 rows=1700 width=126)
         Sort Key: timestamp
         ->  Index Scan using object_values_objectID on object_values  (cost=0.00..6449.65 rows=1700 width=126)
               Index Cond: (objectID = 53708)

In general I assume it has to do with the planner making wrong assumptions about the exectution costs and therefore chooses for a slower execution plan.

一般来说，我认为这与计划者对执行成本做出错误假设有关，因此选择了较慢的执行计划。

Is this the real reason? Is there a solution for this?

这是真正的原因吗？有解决方案吗？

Answer 1

采纳答案by Denis de Bernardy

You're running into an issue which relates, I think, to the lack of statistics on row correlations. Consider reporting it to pg-bugs for reference if this is using the latest version Postgres.

您遇到了一个问题，我认为这与缺乏行相关性统计数据有关。如果这是使用最新版本的 Postgres，请考虑将其报告给 pg-bugs 以供参考。

The interpretation I'd suggest for your plans is:

我为您的计划建议的解释是：

limit 1makes Postgres look for a single row, and in doing so it assumes that your object_id is common enough that it'll show up reasonably quickly in an index scan.
Based on the stats you gave its thinking probably is that it'll need to read ~70 rows on average to find one row that fits; it just doesn't realize that object_id and timestamp correlate to the point where it's actually going to read a large portion of the table.
limit 3, in contrast, makes it realize that it's uncommon enough, so it seriously considers (and ends up…) top-n sorting an expected 1700 rows with the object_idyou want, on grounds that doing so is likely cheaper.
For instance, it might know that the distribution of these rows is so that they're all packed in the same area on the disk.
no limitclause means it'll fetch the 1700 anyways, so it goes straight for the index on object_id.

limit 1使 Postgres 查找单行，并在这样做时假定您的 object_id 足够常见，它会在索引扫描中相当快地显示出来。
根据您给出的统计数据，它的想法可能是平均需要读取约 70 行才能找到适合的行；它只是没有意识到 object_id 和时间戳与它实际要读取表的大部分内容的点相关。
limit 3，相比之下，让它意识到它足够不常见，所以它认真考虑（并最终......）用object_id你想要的前 n 排序预期的 1700 行，理由是这样做可能更便宜。
例如，它可能知道这些行的分布是为了它们都打包在磁盘上的同一区域中。
nolimit子句意味着它无论如何都会获取 1700，所以它直接获取object_id.

Solution, btw: add an index on (object_id, timestamp)or (object_id, timestamp desc).

解决方案，顺便说一句：在(object_id, timestamp)or上添加索引(object_id, timestamp desc)。

Answer 2

回答by Brendan Nee

You can avoid this issue by adding an unneeded ORDER BYclause to the query.

您可以通过向ORDER BY查询添加不需要的子句来避免此问题。

SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp, objectID DESC limit 1;

Answer 3

回答by Dan Tanner

I started having similar symptoms on an update-heavy table, and what was needed in my case was

我开始在更新繁重的表上出现类似的症状，我的情况需要的是

analyze $table_name;

In this case the statistics needed to be refreshed, which then fixed the slow query plans that were occurring.
Supporting docs: https://www.postgresql.org/docs/current/sql-analyze.html

在这种情况下，需要刷新统计信息，然后修复正在发生的缓慢查询计划。
支持文档：https: //www.postgresql.org/docs/current/sql-analyze.html

PostgreSQL 查询速度非常慢，限制为 1

提问by pat

采纳答案by Denis de Bernardy

回答by Brendan Nee

回答by Dan Tanner

相关推荐

最近更新

标签

PostgreSQL 查询速度非常慢，限制为 1

提问by pat

采纳答案by Denis de Bernardy

回答by Brendan Nee

回答by Dan Tanner

相关推荐

postgresql 您要找的页面不存在 - heroku+rails

postgresql 使用 node-postgres 在 utc 中获取 Postgres“没有时区的时间戳”

postgresql Postgres报一个关系不存在，但是表存在

内存中的 PostgreSQL 表

相关推荐

最近更新

标签