PostgreSQL 可以索引数组列吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4058731/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Can PostgreSQL index array columns?
提问by IamIC
I can't find a definite answer to this question in the documentation. If a column is an array type, will all the entered values be individually indexed?
我在文档中找不到这个问题的明确答案。如果列是数组类型,是否所有输入的值都会被单独索引?
I created a simple table with one int[]
column, and put a unique index on it. I noticed that I couldn't add the same array of ints, which leads me to believe the index is a composite of the array items, not an index of each item.
我创建了一个包含一int[]
列的简单表,并在其上放置了一个唯一索引。我注意到我无法添加相同的整数数组,这让我相信索引是数组项的组合,而不是每个项的索引。
INSERT INTO "Test"."Test" VALUES ('{10, 15, 20}');
INSERT INTO "Test"."Test" VALUES ('{10, 20, 30}');
SELECT * FROM "Test"."Test" WHERE 20 = ANY ("Column1");
Is the index helping this query?
索引对这个查询有帮助吗?
回答by Frank Heikens
Yes you can index an array, but you have to use the array operatorsand the GIN-index type.
是的,您可以索引数组,但您必须使用数组运算符和GIN-index type。
Example:
例子:
CREATE TABLE "Test"("Column1" int[]);
INSERT INTO "Test" VALUES ('{10, 15, 20}');
INSERT INTO "Test" VALUES ('{10, 20, 30}');
CREATE INDEX idx_test on "Test" USING GIN ("Column1");
-- To enforce index usage because we have only 2 records for this test...
SET enable_seqscan TO off;
EXPLAIN ANALYZE
SELECT * FROM "Test" WHERE "Column1" @> ARRAY[20];
Result:
结果:
Bitmap Heap Scan on "Test" (cost=4.26..8.27 rows=1 width=32) (actual time=0.014..0.015 rows=2 loops=1)
Recheck Cond: ("Column1" @> '{20}'::integer[])
-> Bitmap Index Scan on idx_test (cost=0.00..4.26 rows=1 width=0) (actual time=0.009..0.009 rows=2 loops=1)
Index Cond: ("Column1" @> '{20}'::integer[])
Total runtime: 0.062 ms
笔记
it appears that in many cases the gin__int_opsoption is required
似乎在许多情况下需要gin__int_ops选项
create index <index_name> on <table_name> using GIN (<column> gin__int_ops)
I have not yet seen a case where it would work with the && and @> operator without the gin__int_ops options
我还没有看到在没有 gin__int_ops 选项的情况下它可以与 && 和 @> 运算符一起使用的情况
回答by Erwin Brandstetter
@Tregoreg raised a questionin the comment to his offered bounty:
@Tregoreg在对他提供的赏金的评论中提出了一个问题:
I didn't find the current answers working. Using GIN index on array-typed column does not increase the performance of ANY() operator. Is there really no solution?
我没有发现当前的答案有效。在数组类型的列上使用 GIN 索引不会提高 ANY() 运算符的性能。真的没有解决办法吗?
@Frank's accepted answertells you to use array operators, which is still correctfor Postgres 11. The manual:
@Frank 接受的答案告诉您使用数组运算符,这对于 Postgres 11仍然是正确的。手册:
... the standard distribution of PostgreSQL includes a GIN operator class for arrays, which supports indexed queries using these operators:
<@ @> = &&
... PostgreSQL 的标准发行版包括一个用于数组的 GIN 运算符类,它支持使用这些运算符的索引查询:
<@ @> = &&
The complete list of built-in operator classes for GIN indexes in the standard distribution is here.
In Postgres indexes are bound to operators(which are implemented for certain types), not data types alone or functions or anything else. That's a heritage from the original Berkeley design of Postgresand very hard to change now. And it's generally working just fine. Here is a thread on pgsql-bugs with Tom Lane commenting on this.
在 Postgres 中,索引绑定到运算符(为某些类型实现),而不是单独的数据类型或函数或其他任何东西。这是 Postgres 最初的伯克利设计的遗产,现在很难改变。它通常工作得很好。这是一个关于 pgsql-bugs 的线程,Tom Lane 对此进行了评论。
Some PostGis functions(like ST_DWithin()
) seem to violate this principal, but that is not so. Those functions are rewritten internally to use respective operators.
一些 PostGis函数(如ST_DWithin()
)似乎违反了这一原则,但事实并非如此。这些函数在内部被重写以使用各自的运算符。
The indexed expression must be to the leftof the operator. For most operators (including all of the above) the query planner can achieve this by flipping operands if you place the indexed expression to the right - given that a COMMUTATOR
has been defined. The ANY
constructcan be used in combination with various operators and is not an operator itself. When used as constant = ANY (array_expression)
only indexes supporting the =
operator on array elementswould qualify and we would need a commutator for = ANY()
. GIN indexes are out.
索引表达式必须位于运算符的左侧。对于大多数运算符(包括以上所有运算符),如果将索引表达式放在右侧,查询规划器可以通过翻转操作数来实现这一点——假设 aCOMMUTATOR
已经定义。该ANY
构造可以与各种运算符结合使用,并且本身不是运算符。当constant = ANY (array_expression)
仅用作支持数组元素=
运算符的索引时,我们将需要一个用于. GIN 索引已出。= ANY()
Postgres is not currently smart enough to derive a GIN-indexable expression from it. For starters, constant = ANY (array_expression)
is not completely equivalentto array_expression @> ARRAY[constant]
. Array operators return an error if any NULL elementsare involved, while the ANY
construct can deal with NULL on either side. And there are different results for data type mismatches.
Postgres 目前还不够智能,无法从中导出 GIN 可索引的表达式。对于初学者来说,constant = ANY (array_expression)
是不完全等同于array_expression @> ARRAY[constant]
。如果涉及任何 NULL元素,数组运算符将返回错误,而ANY
构造可以在任一侧处理 NULL。并且对于数据类型不匹配有不同的结果。
Related answers:
相关回答:
Asides
旁白
While working with integer
arrays(int4
, not int2
or int8
) without NULL
values (like your example implies) consider the additional module intarray
, that provides specialized, faster operators and index support. See:
在处理没有值的integer
数组( int4
, not int2
or int8
) 时NULL
(如您的示例所暗示的那样),请考虑附加模块intarray
,它提供专门的、更快的运算符和索引支持。看:
- How to create an index for elements of an array in PostgreSQL?
- Compare arrays for equality, ignoring order of elements
As for the UNIQUE
constraint in your question that went unanswered: That's implemented with a btree index on the whole arrayvalue (like you suspected) and does not help with the search for elementsat all. Details:
至于UNIQUE
您的问题中没有得到解答的约束:这是通过对整个数组值的 btree 索引实现的(就像您怀疑的那样),并且根本没有帮助搜索元素。细节:
回答by Ed4
It's now possible to index the individual array elements. For example:
现在可以索引单个数组元素。例如:
CREATE TABLE test (foo int[]);
INSERT INTO test VALUES ('{1,2,3}');
INSERT INTO test VALUES ('{4,5,6}');
CREATE INDEX test_index on test ((foo[1]));
SET enable_seqscan TO off;
EXPLAIN ANALYZE SELECT * from test WHERE foo[1]=1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Scan using test_index on test (cost=0.00..8.27 rows=1 width=32) (actual time=0.070..0.071 rows=1 loops=1)
Index Cond: (foo[1] = 1)
Total runtime: 0.112 ms
(3 rows)
This works on at least Postgres 9.2.1. Note that you need to build a separate index for each array index, in my example I only indexed the first element.
这至少适用于 Postgres 9.2.1。请注意,您需要为每个数组索引建立一个单独的索引,在我的示例中,我只索引了第一个元素。