postgresql 搜索字符串中的单个单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19369458/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Searching individual words in a string
提问by Erwin Brandstetter
I know about full-text search, but that only matches your query against individual words. I want to select strings that contain a word that starts with words in my query. For example, if I search:
我知道全文搜索,但这只匹配您对单个单词的查询。我想选择包含以我的查询中的单词开头的单词的字符串。例如,如果我搜索:
appl
the following should match:
以下应匹配:
a really nice application
apples are cool
appliances
since all those strings contains words that start with appl
. In addition, it would be nice if I could select the number of words that match, and sort based on that.
因为所有这些字符串都包含以appl
.开头的单词。此外,如果我可以选择匹配的单词数量并根据该数量进行排序,那就太好了。
How can I implement this in PostgreSQL?
我如何在 PostgreSQL 中实现它?
采纳答案by Hamza Kubba
SELECT * FROM some_table WHERE some_field LIKE 'appl%' OR some_field LIKE '% appl%';
As for counting the number of words that match, I believe that would be too expensive to do dynamically in postgres (though maybe someone else knows better). One way you could do it is by writing a function that counts occurrences in a string, and then add ORDER BY myFunction('appl', some_field)
. Again though, this method is VERY expensive (i.e. slow) and not recommended.
至于计算匹配的单词数,我认为在 postgres 中动态执行会太昂贵(尽管也许其他人更清楚)。一种方法是编写一个函数来计算字符串中出现的次数,然后添加ORDER BY myFunction('appl', some_field)
. 尽管如此,这种方法非常昂贵(即缓慢),不推荐使用。
For things like that, you should probably use a separate/complimentary full-text search engine like Sphinx Search (google it), which is specialized for that sort of thing.
对于这样的事情,你可能应该使用一个单独的/免费的全文搜索引擎,比如 Sphinx Search (google it),它专门用于这类事情。
An alternative to that, is to have another table that contains keywords and the number of occurrences of those keywords in each string. This means you need to store each phrase you have (e.g. really really nice application
) and also store the keywords in another table (i.e. really, 2
, nice, 1
, application, 1
) and link that keyword table to your full-phrase table. This means that you would have to break up strings into keywords as they are entered into your database and store them in two places. This is a typical space vs speed trade-off.
另一种方法是使用另一个表,其中包含关键字以及这些关键字在每个字符串中的出现次数。这意味着您需要存储您拥有的每个短语(例如really really nice application
),并将关键字存储在另一个表中(即really, 2
,nice, 1
,application, 1
)并将该关键字表链接到您的完整短语表。这意味着您必须在输入数据库时将字符串分解为关键字并将它们存储在两个位置。这是典型的空间与速度权衡。
回答by Erwin Brandstetter
Prefix matching with Full Text Search
与全文搜索的前缀匹配
Revisiting this question some years later, it strikes me, that FTS doessupport prefix matching. Your query could work like this:
几年后重新审视这个问题,让我震惊的是,FTS确实支持前缀匹配。您的查询可以这样工作:
SELECT * FROM tbl
WHERE to_tsvector('simple', string) @@ to_tsquery('simple', 'appl:*');
Note the appended :*
in the tsquery
. Details:
注意附加:*
在tsquery
. 细节:
Alternative with regular expressions
正则表达式的替代
SELECT * FROM tbl
WHERE string ~ '\mappl';
Quoting the manual here:
\m
.. matches only at the beginning of a word
\m
.. 只匹配单词的开头
To order by the count of matches, you could use regexp_matches()
要按匹配数排序,您可以使用 regexp_matches()
SELECT tbl_id, count(*) AS matches
FROM (
SELECT tbl_id, regexp_matches(string, '\mappl', 'g')
FROM tbl
WHERE string ~ '\mappl'
) sub
GROUP BY 1
ORDER BY 2 DESC;
SELECT tbl_id, string, count(*) - 1 AS matches
FROM (
SELECT tbl_id, string, regexp_split_to_table(string, '\mappl')
FROM tbl
WHERE string ~ '\mappl'
) sub
GROUP BY 1, 2
ORDER BY 3 DESC, 2, 1;
SQL Fiddledemonstrating all three.
SQL Fiddle演示了所有三个。
Postgres 9.3 even offers index supportfor simple regular expressions, with a trigram GIN or GiST index (quoting the release notes):
Postgres 9.3 甚至为简单的正则表达式提供索引支持,带有三元组 GIN 或 GiST 索引(引用发行说明):
Add support for indexing of regular-expression searches in pg_trgm (Alexander Korotkov)
在 pg_trgm (Alexander Korotkov) 中添加对正则表达式搜索索引的支持