为什么 PostgreSQL 对索引列执行顺序扫描?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5203755/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 22:50:54  来源:igfitidea点击:

Why does PostgreSQL perform sequential scan on indexed column?

postgresqlindexingsequencedatabase-scan

提问by Alex Vayda

Very simple example - one table, one index, one query:

非常简单的例子——一张表,一个索引,一个查询:

CREATE TABLE book
(
  id bigserial NOT NULL,
  "year" integer,
  -- other columns...
);

CREATE INDEX book_year_idx ON book (year)

EXPLAIN
 SELECT *
   FROM book b
  WHERE b.year > 2009

gives me:

给我:

Seq Scan on book b  (cost=0.00..25663.80 rows=105425 width=622)
  Filter: (year > 2009)

Why it does NOT perform index scan instead? What am I missing?

为什么它不执行索引扫描?我错过了什么?

回答by a_horse_with_no_name

If the SELECT returns more than approximately 5-10% of all rows in the table, a sequential scan is much faster than an index scan.

如果 SELECT 返回表中所有行的大约 5-10% 以上,则顺序扫描比索引扫描快得多。

This is because an index scan requires severalIO operations for each row (look up the row in the index, then retrieve the row from the heap). Whereas a sequential scan only requires a single IO for each row - or even less because a block (page) on the disk contains more than one row, so more than one row can be fetched with a single IO operation.

这是因为索引扫描需要对每一行进行多次IO 操作(在索引中查找该行,然后从堆中检索该行)。而顺序扫描只需要对每一行进行一次 IO - 甚至更少,因为磁盘上的一个块(页面)包含不止一行,因此可以通过一次 IO 操作获取不止一行。

Btw: this is true for other DBMS as well - some optimizations as "index only scans" taken aside (but for a SELECT * it's highly unlikely such a DBMS would go for an "index only scan")

顺便说一句:这对于其他 DBMS 也是如此 - 一些优化如“仅索引扫描”被搁置(但对于 SELECT *,这样的 DBMS 极不可能进行“仅索引扫描”)

回答by Frank Heikens

Did you ANALYZEthe table/database? And what about the statistics? When there are many records where year > 2009, a sequential scan might be faster than an index scan.

分析过表/数据库吗?并且怎么样的统计数据?当有许多年 > 2009 年的记录时,顺序扫描可能比索引扫描快。

回答by Gaurav Neema

In index scan, read head jumps from one row to another which is 1000 times slower than reading the next physical block (in the sequential scan).

在索引扫描中,读头从一行跳到另一行,这比读取下一个物理块(在顺序扫描中)慢 1000 倍。

So, if the (number of records to be retrieved * 1000) is less than the total number of records, the index scan will perform better.

因此,如果(要检索的记录数* 1000)小于总记录数,索引扫描的性能会更好。

回答by Shitij Goyal

@a_horse_with_no_name explained it quite well. Also if you really want to use an index scan, you should generally use bounded ranges in where clause. eg - year > 2019 and year < 2020.

@a_horse_with_no_name 解释得很好。此外,如果您真的想使用索引扫描,通常应该在 where 子句中使用有界范围。例如 - year > 2019 和 year < 2020。

A lot of the times statistics are not updated on a table and it may not be possible to do so due to constraints. In this case, the optimizer will not know how many rows it should take in year > 2019. Thus it selects a sequential scan in lieu of full knowledge. Bounded partitions will solve the problem most of the time.

很多时候没有更新表上的统计信息,并且由于限制可能无法这样做。在这种情况下,优化器将不知道在 year > 2019 中应该取多少行。因此它选择顺序扫描来代替完整的知识。大多数情况下,有界分区将解决问题。