postgresql Postgres 查询优化(强制索引扫描)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14554302/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 23:59:50  来源:igfitidea点击:

Postgres query optimization (forcing an index scan)

postgresqlindexingquery-optimizationpostgresql-9.1postgresql-performance

提问by Jeff

Below is my query. I am trying to get it to use an index scan, but it will only seq scan.

以下是我的查询。我试图让它使用索引扫描,但它只会 seq 扫描。

By the way the metric_datatable has 130 million rows. The metricstable has about 2000 rows.

顺便说一下,该metric_data表有 1.3 亿行。该metrics表有大约 2000 行。

metric_datatable columns:

metric_data表格列:

  metric_id integer
, t timestamp
, d double precision
, PRIMARY KEY (metric_id, t)

How can I get this query to use my PRIMARY KEY index?

我怎样才能让这个查询使用我的 PRIMARY KEY 索引?

SELECT
    S.metric,
    D.t,
    D.d
FROM metric_data D
INNER JOIN metrics S
    ON S.id = D.metric_id
WHERE S.NAME = ANY (ARRAY ['cpu', 'mem'])
  AND D.t BETWEEN '2012-02-05 00:00:00'::TIMESTAMP
              AND '2012-05-05 00:00:00'::TIMESTAMP;

EXPLAIN:

解释:

Hash Join  (cost=271.30..3866384.25 rows=294973 width=25)
  Hash Cond: (d.metric_id = s.id)
  ->  Seq Scan on metric_data d  (cost=0.00..3753150.28 rows=29336784 width=20)
        Filter: ((t >= '2012-02-05 00:00:00'::timestamp without time zone)
             AND (t <= '2012-05-05 00:00:00'::timestamp without time zone))
  ->  Hash  (cost=270.44..270.44 rows=68 width=13)
        ->  Seq Scan on metrics s  (cost=0.00..270.44 rows=68 width=13)
              Filter: ((sym)::text = ANY ('{cpu,mem}'::text[]))

回答by Erwin Brandstetter

For testing purposes you can force the use of the index by "disabling" sequential scans - best in your current session only:

出于测试目的,您可以通过“禁用”顺序扫描来强制使用索引 - 最好仅在当前会话中使用:

SET enable_seqscan = OFF;

Details in the manual here.I quoted "disabling", because you cannot actually disable sequential table scans. But any other available option is now preferable for Postgres. This will prove that the multicolumn index on (metric_id, t)canbe used - just not as effective as an index on the leading column.

此处的手册中的详细信息。我引用了“禁用”,因为您实际上无法禁用顺序表扫描。但是现在任何其他可用的选项对于 Postgres 都是可取的。这将证明(metric_id, t)可以使用多列索引- 只是不如前导列上的索引有效。

You probably get better results by switching the order of columns in your PRIMARY KEY(and the index used to implement it behind the curtains with it) to (t, metric_id). Or create an additionalindex with reversed columns like that.

通过将列的顺序PRIMARY KEY(以及用于在窗帘后面实现它的索引)切换到(t, metric_id). 或者创建一个带有反向列的附加索引。

You do not normally have to force better query plans by manual intervention. If setting enable_seqscan = OFFleads to a muchbetter plan, something is probably not right in your database. Consider this related answer:

您通常不必通过手动干预来强制执行更好的查询计划。如果设置 enable_seqscan = OFF导致一个更好的计划,一些可能是不正确的在你的数据库。考虑这个相关的答案:

回答by mvp

You cannot force index scan in this case because it will not make it faster.

在这种情况下你不能强制索引扫描,因为它不会使它更快。

You currently have index on metric_data (metric_id, t), but server cannot take advantage of this index for your query, because it needs to be able to discriminate by metric_data.tonly (without metric_id), but there is no such index. Server can use sub-fields in compound indexes, but only starting from the beginning. For example, searching by metric_idwill be able to employ this index.

您当前在 上有索引metric_data (metric_id, t),但服务器无法利用此索引进行查询,因为它需要能够metric_data.t仅通过(没有metric_id)来区分,但没有这样的索引。服务器可以在复合索引中使用子字段,但只能从头开始。例如,搜索依据metric_id将能够使用该索引。

If you create another index on metric_data (t), your query will make use of that index and will work much faster.

如果您在 上创建另一个索引metric_data (t),您的查询将使用该索引并且工作速度会快得多。

Also, you should make sure that you have an index on metrics (id).

此外,您应该确保在metrics (id).

回答by joop

It appears you are lacking suitable FK constraints:

看来您缺乏合适的 FK 约束:

CREATE TABLE metric_data
( metric_id integer
, t timestamp
, d double precision
, PRIMARY KEY (metric_id, t)
, FOREIGN KEY metrics_xxx_fk (metric_id) REFERENCES metrics (id)
)

and in table metrics:

并在表格指标中:

CREATE TABLE metrics
( id INTEGER PRIMARY KEY
...
);

Also check if your statistics are sufficient (and fine-grained enough, since you intend to select 0.2 % of the metrics_data table)

还要检查您的统计数据是否足够(并且足够细粒度,因为您打算选择 metrics_data 表的 0.2%)

回答by Gabriel Bastos

Have you tried to use:

您是否尝试过使用:

WHERE S.NAME = ANY (VALUES ('cpu'), ('mem')) instead of ARRAY

WHERE S.NAME = ANY (VALUES ('cpu'), ('mem')) 而不是 ARRAY

like here

喜欢这里