为什么 PostgreSQL Text-Search GiST 索引比 GIN 索引慢这么多?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1540374/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why are PostgreSQL Text-Search GiST indexes so much slower than GIN indexes?
提问by Bill Karwin
I'm testing out the PostgreSQL Text-Search features, using the September data dump from StackOverflow as sample data. :-)
我正在测试 PostgreSQL 文本搜索功能,使用来自 StackOverflow 的 9 月数据转储作为示例数据。:-)
The naive approach of using LIKE
predicates or POSIX regular expression matching to search 1.2 million rows takes about 90-105 seconds(on my Macbook) to do a full table-scan searching for a keyword.
使用LIKE
谓词或 POSIX 正则表达式匹配来搜索 120 万行的幼稚方法需要大约90-105 秒(在我的 Macbook 上)来执行全表扫描搜索关键字。
SELECT * FROM Posts WHERE body LIKE '%postgresql%';
SELECT * FROM Posts WHERE body ~ 'postgresql';
An unindexed, ad hoc text-search query takes about 8 minutes:
未编入索引的临时文本搜索查询大约需要8 分钟:
SELECT * FROM Posts WHERE to_tsvector(body) @@ to_tsquery('postgresql');
Creating a GIN index takes about 40 minutes:
创建 GIN 索引大约需要40 分钟:
ALTER TABLE Posts ADD COLUMN PostText TSVECTOR;
UPDATE Posts SET PostText = to_tsvector(body);
CREATE INDEX PostText_GIN ON Posts USING GIN(PostText);
(I realize I could also do this in one step by defining it as an expression index.)
(我意识到我也可以通过将其定义为表达式索引来一步完成此操作。)
Afterwards, a query assisted by a GIN index runs a lot faster -- this takes about 40 milliseconds:
之后,由 GIN 索引辅助的查询运行得更快——这大约需要40 毫秒:
SELECT * FROM Posts WHERE PostText @@ 'postgresql';
However, when I create a GiST index, the results are quite different. It takes less than 2 minutesto create the index:
但是,当我创建 GiST 索引时,结果却大不相同。创建索引只需不到2分钟:
CREATE INDEX PostText_GIN ON Posts USING GIST(PostText);
Afterwards, a query using the @@
text-search operator takes 90-100 seconds. So GiST indexes do improve an unindexed TS query from 8 minutes to 1.5 minutes. But that's no improvement over doing a full table-scan with LIKE
. It's useless in a web programming environment.
之后,使用@@
文本搜索运算符的查询需要90-100 秒。因此,GiST 索引确实将未索引的 TS 查询从 8 分钟提高到 1.5 分钟。但这与使用LIKE
. 它在 Web 编程环境中是无用的。
Am I missing something crucial to using GiST indexes? Do the indexes need to be pre-cached in memory or something? I am using a plain PostgreSQL installation from MacPorts, with no tuning.
我是否遗漏了使用 GiST 索引的关键信息?索引是否需要预先缓存在内存中?我使用的是 MacPorts 的普通 PostgreSQL 安装,没有进行调整。
What is the recommended way to use GiST indexes? Or does everyone doing TS with PostgreSQL skip GiST indexes and use only GIN indexes?
使用 GiST 索引的推荐方法是什么?还是每个用 PostgreSQL 做 TS 的人都跳过 GiST 索引而只使用 GIN 索引?
PS: I do know about alternatives like Sphinx Search and Lucene. I'm just trying to learn about the features provided by PostgreSQL itself.
PS:我确实知道像 Sphinx Search 和 Lucene 这样的替代品。我只是想了解 PostgreSQL 本身提供的功能。
采纳答案by Jonathan Feinberg
try
尝试
CREATE INDEX PostText_GIST ON Posts USING GIST(PostText varchar_pattern_ops);
which creates an index suitable for prefix queries. See the PostgreSQL docs on Operator Classes and Operator Families. The @@ operator is only sensible on term vectors; the GiST index (with varchar_pattern_ops) will give excellent results with LIKE.
它创建了一个适合前缀查询的索引。请参阅有关Operator Classes 和 Operator Families的 PostgreSQL 文档。@@ 运算符仅对术语向量有效;GiST 索引(带有 varchar_pattern_ops)将在 LIKE 中提供出色的结果。
回答by mattonrails
The docs have a nice overview of the performance differences between GiST and GIN indexes if you're interested: GiST and GIN Index Types.
如果您感兴趣,这些文档很好地概述了 GiST 和 GIN 索引之间的性能差异:GiST 和 GIN 索引类型。
回答by John
btw: if this hasn't been answered to your satisfaction yet, the part where you did
顺便说一句:如果这还没有得到您满意的答复,那么您所做的部分
SELECT * FROM Posts WHERE PostText @@ 'postgresql';
SELECT * FROM Posts WHERE PostText @@ 'postgresql';
should have been
本来应该
SELECT * FROM Posts WHERE PostText @@ to_tsquery('postgresql');
SELECT * FROM Posts WHERE PostText @@ to_tsquery('postgresql');