PostgreSQL 索引不用于范围查询

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14407719/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 00:44:34  来源:igfitidea点击:

PostgreSQL index not used for query on range

postgresqldatabase-designindexingbetween

提问by Zain Zafar

I'm using PostgreSQL (9.2.0) and have a table of IP ranges. Here's the SQL:

我正在使用 PostgreSQL (9.2.0) 并有一个 IP 范围表。这是 SQL:

CREATE TABLE ips
(
  id serial NOT NULL,
  begin_ip_num bigint,
  end_ip_num bigint,
  country_name character varying(255),
  CONSTRAINT ips_pkey PRIMARY KEY (id )
)

I've added indices on both begin_ip_numand end_ip_num:

我在begin_ip_num和上都添加了索引end_ip_num

CREATE INDEX index_ips_on_begin_ip_num
  ON ips
  USING btree
  (begin_ip_num );

CREATE INDEX index_ips_on_end_ip_num
  ON ips
  USING btree
  (end_ip_num );

The Query being used is:

正在使用的查询是:

SELECT "ips".* FROM "ips" WHERE (3065106743 BETWEEN begin_ip_num AND end_ip_num);

The problem is that my BETWEENquery is only using the index on begin_ip_num. After using the index, it filters the result using end_ip_num. Here's the EXPLAIN ANALYZEresult:

问题是我的BETWEEN查询仅使用begin_ip_num. 使用索引后,它使用 过滤结果end_ip_num。这里的EXPLAIN ANALYZE结果:

Index Scan using index_ips_on_begin_ip_num on ips  (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1)
Index Cond: (3065106743::bigint >= begin_ip_num)
Filter: (3065106743::bigint <= end_ip_num)
Rows Removed by Filter: 47596
Total runtime: 16.425 ms

I've already tried various combinations of indices including adding a composite index on both begin_ip_numand end_ip_num.

我已经尝试了各种索引组合,包括在begin_ip_num和上添加复合索引end_ip_num

回答by Erwin Brandstetter

Try a multicolumn index, but with reversed order on the second column:

尝试多列索引,但在第二列上使用相反的顺序:

CREATE INDEX index_ips_begin_end_ip_num ON ips (begin_ip_num, end_ip_num DESC);

Ordering is mostly irrelevant for a single-column index, since it can be scanned backwards almost as fast. But it is important for multicolumn indexes.

排序与单列索引几乎无关,因为它几乎可以同样快地向后扫描。但它对于多列索引很重要。

With the index I propose, Postgres can scan the first column and find the address, where the rest of the index fulfills the first condition. Then it can, for each value of the first column, return all rows that fulfill the second condition, until the first one fails. Then jump to the next value of the first column, etc.
This is still not very efficientand Postgres may be faster just scanning the first index column and filtering for the second. Very much depends on your data distribution.

使用我提出的索引,Postgres 可以扫描第一列并找到地址,索引的其余部分满足第一个条件。然后,对于第一列的每个值,它可以返回满足第二个条件的所有行,直到第一个失败。然后跳转到第一列的下一个值,依此类推。
仍然不是很有效,Postgres 可能会更快,只需扫描第一个索引列并过滤第二个。很大程度上取决于您的数据分布。

What would really help here is a GiST indexfor a int8rangecolumn, available since PostgreSQL 9.2.

什么会真正帮助这里是一个GiST的索引int8range,因为PostgreSQL的9.2可用。

Barring that, you can check out this closely related answer on dba.SEwith a rather sophisticated regime with partial indexes. Advanced stuff, but it delivers great performance.

除此之外,您可以在 dba.SE 上查看这个密切相关的答案,答案具有相当复杂的部分索引制度。先进的东西,但它提供了很好的性能。

Either way, CLUSTERusing the multicolumn index from above canhelp performance:

无论哪种方式,CLUSTER使用上面的多列索引都可以帮助提高性能:

CLUSTER ips USING index_ips_begin_end_ip_num

This way, candidates fulfilling your first condition are packed onto the same or adjacent data pages. Can help performance a lot with if you have lots of rows per value of the first column. Else it is hardly effective.

这样,满足您的第一个条件的候选人将被打包到相同或相邻的数据页上。如果第一列的每个值有很多行,则可以帮助提高性能。否则几乎没有效果。

Also, is autovacuumrunning or have you run ANALYZEon the table? You need current statistics for Postgres to pick appropriate query plans.

另外,autovacuum是在运行还是ANALYZE在桌子上运行?您需要 Postgres 的当前统计信息来选择合适的查询计划。

回答by pbnelson

I had exactly this same problem on a nearly identical dataset from maxmind.com's free geiop table. I solved it using Erwin's tip about range types and GiST indexes. The GiST index was key. Without it I was querying at best about 3 rows per second. With it I queried nearly 500000 rows in under 10 seconds! Since Erwin didn't post detailed instructions on how to do this, I thought I'd add them, here...

我在来自 maxmind.com 的免费 geiop 表的几乎相同的数据集上遇到了完全相同的问题。我使用 Erwin 关于范围类型和 GiST 索引的提示解决了它。GiST 索引是关键。没有它,我最多每秒查询 3 行。有了它,我在 10 秒内查询了近 500000 行!由于 Erwin 没有发布有关如何执行此操作的详细说明,我想我会在此处添加它们...

First of all, you must add a new column having the range type, note that int8range is required for bigint types. Next set its values appropriately, note that the '[]' parameter indicates to make the range inclusiveat lower and upper bounds (rtfm). Finally add the index, note that the GiST index is where all the performance advantage comes from.

首先,您必须添加一个具有范围类型的新列,注意 bigint 类型需要 int8range。接下来适当地设置它的值,注意'[]'参数指示使范围包含上下限(rtfm)。最后添加索引,注意 GiST 索引是所有性能优势的来源。

alter table ips add column iprange int8range;
update ips set iprange=int8range(begin_ip_num, end_ip_num, '[]');
create index index_ips_on_iprange on ips using gist (iprange);

Having laid the groundwork, you can now use the '<@' contained-by operator to search specific addresses against the table. See http://www.postgresql.org/docs/9.2/static/functions-range.html

打好基础后,您现在可以使用 '<@' 包含运算符来针对表搜索特定地址。见http://www.postgresql.org/docs/9.2/static/functions-range.html

SELECT "ips".* FROM "ips" WHERE (3065106743::bigint <@ iprange);

回答by Derek

I'm a bit late to this party, but this is what works really well for me.

我参加这个派对有点晚了,但这对我来说真的很管用。

Consider installing ip4r extension. It basically allows you to define a column that can hold IP ranges. The name of the extension implies it is just for IPv4, but currently it is also support IPv6.

考虑安装ip4r 扩展。它基本上允许您定义一个可以容纳 IP 范围的列。扩展名暗示它仅适用于 IPv4,但目前它也支持 IPv6。

After you populate table with ranges within that column all you need, is to create GIST index:

在使用该列内的范围填充表后,您只需要创建 GIST 索引:

CREATE INDEX ip_zip_ip4_range ON ip_zip USING gist (ip4_range);

I have almost 10 million ranges in my database, but queries take fraction of a milisecond:

我的数据库中有近 1000 万个范围,但查询只需要几分之一毫秒:

region=> select count(*) from ip_zip ;

  count  
---------
 9566133

region=> explain analyze select * from ip_zip where '8.8.8.8'::ip4 <<= ip4_range;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ip_zip  (cost=234.55..25681.29 rows=9566 width=22) (actual time=0.085..0.086 rows=1 loops=1)
   Recheck Cond: ('8.8.8.8'::ip4r <<= ip4_range)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on ip_zip_ip4_range  (cost=0.00..232.16 rows=9566 width=0) (actual time=0.055..0.055 rows=1 loops=1)
         Index Cond: ('8.8.8.8'::ip4r <<= ip4_range)
 Planning time: 0.106 ms
 Execution time: 0.118 ms
(7 rows)

region=> explain analyze select * from ip_zip where '254.50.22.54'::ip4 <<= ip4_range;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ip_zip  (cost=234.55..25681.29 rows=9566 width=22) (actual time=0.059..0.059 rows=1 loops=1)
   Recheck Cond: ('254.50.22.54'::ip4r <<= ip4_range)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on ip_zip_ip4_range  (cost=0.00..232.16 rows=9566 width=0) (actual time=0.048..0.048 rows=1 loops=1)
         Index Cond: ('254.50.22.54'::ip4r <<= ip4_range)
 Planning time: 0.102 ms
 Execution time: 0.145 ms
(7 rows)

回答by a1ex07

I believe your query looks like WHERE [constant] BETWEEN begin_ip_num AND end_ipnumor

我相信您的查询看起来像WHERE [constant] BETWEEN begin_ip_num AND end_ipnum

As far as I know Postgres doesn't have "AND-EQUAL " access plan, so you need to add a composite index on 2 columns as suggested by Erwin Brandstetter.

据我所知,Postgres 没有“AND-EQUAL”访问计划,因此您需要按照Erwin Brandstetter 的建议在 2 列上添加复合索引。