postgresql 有没有办法在 postgres 中建立索引以进行快速子字符串搜索
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17633344/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there a way to index in postgres for fast substring searches
提问by dan
I have a database and want to be able to look up in a table a search that's something like: select * from table where column like "abc%def%ghi" or select * from table where column like "%def%ghi" Is there a way to index the column so that this isn't too slow?
我有一个数据库,并希望能够在表中查找类似于以下内容的搜索: select * from table where column like "abc%def%ghi" 或 select * from table where column like "%def%ghi" Is有没有办法为列建立索引,这样就不会太慢?
Edit: Can I also clarify that the database is read only and won't be updated often.
编辑:我还可以澄清一下数据库是只读的,不会经常更新。
回答by Craig Ringer
Options for text search and indexing include:
文本搜索和索引选项包括:
full-text indexingwith dictionary based search, including support for prefix-search, eg
to_tsvector(mycol) @@ to_tsquery('search:*')
text_pattern_ops
indexesto support prefix string matches egLIKE 'abc%'
but notinfix searches like%blah%
;. Areverse()
d index may be used for suffix searching.pg_tgrm
trigram indexeson newer versions as demonstrated in this recent dba.stackexchange.com post.An external search and indexing tool like Apache Solr.
使用基于字典的搜索进行全文索引,包括对前缀搜索的支持,例如
to_tsvector(mycol) @@ to_tsquery('search:*')
text_pattern_ops
支持前缀字符串匹配的索引,例如LIKE 'abc%'
但不支持中缀搜索,如%blah%
;。甲reverse()
d索引可以用于后缀搜索。一个外部搜索和索引工具,如Apache Solr。
From the minimal information given above, I'd say that only a trigram index will be able to help you, since you're doing infix searches on a string and not looking for dictionary words. Unfortunately, trigram indexes are hugeand rather inefficient; don't expect some kind of magical performance boost, and keep in mind that they take a lot of work for the database engine to build and keep up to date.
从上面给出的最少信息来看,我认为只有三元组索引能够帮助您,因为您正在对字符串进行中缀搜索而不是查找字典单词。不幸的是,trigram 索引很大而且效率很低。不要指望某种神奇的性能提升,并记住它们需要大量的工作来构建数据库引擎并保持最新状态。
回答by rogerdpack
If you need just to, for instance, get unique substrings in an entire table, you can create a substring index:
例如,如果您只需要获取整个表中的唯一子字符串,您可以创建一个子字符串索引:
CREATE INDEX i_test_sbstr ON tablename (substring(columname, 5, 3));
-- start at position 5, go for 3 characters
It is important that the substring() parameters in the index definition are
the same as you use in your query.
回答by Clodoaldo Neto
For the like
operator use one of the operator classes varchar_pattern_ops
or text_pattern_ops
对于like
运算符使用运算符类之一varchar_pattern_ops
或text_pattern_ops
create index test_index on test_table (col varchar_pattern_ops);
That will only work if the pattern does not start with a %
in which case another strategy is required.
这仅在模式不以 a 开头时才有效,%
在这种情况下需要另一种策略。