postgresql 搜索字符串中的单个单词

Question

提问by Erwin Brandstetter

I know about full-text search, but that only matches your query against individual words. I want to select strings that contain a word that starts with words in my query. For example, if I search:

我知道全文搜索，但这只匹配您对单个单词的查询。我想选择包含以我的查询中的单词开头的单词的字符串。例如，如果我搜索：

appl

the following should match:

以下应匹配：

a really nice application
apples are cool
appliances

since all those strings contains words that start with appl. In addition, it would be nice if I could select the number of words that match, and sort based on that.

因为所有这些字符串都包含以appl.开头的单词。此外，如果我可以选择匹配的单词数量并根据该数量进行排序，那就太好了。

How can I implement this in PostgreSQL?

我如何在 PostgreSQL 中实现它？

Answer 1

采纳答案by Hamza Kubba

SELECT * FROM some_table WHERE some_field LIKE 'appl%' OR some_field LIKE '% appl%';

As for counting the number of words that match, I believe that would be too expensive to do dynamically in postgres (though maybe someone else knows better). One way you could do it is by writing a function that counts occurrences in a string, and then add ORDER BY myFunction('appl', some_field). Again though, this method is VERY expensive (i.e. slow) and not recommended.

至于计算匹配的单词数，我认为在 postgres 中动态执行会太昂贵（尽管也许其他人更清楚）。一种方法是编写一个函数来计算字符串中出现的次数，然后添加ORDER BY myFunction('appl', some_field). 尽管如此，这种方法非常昂贵（即缓慢），不推荐使用。

For things like that, you should probably use a separate/complimentary full-text search engine like Sphinx Search (google it), which is specialized for that sort of thing.

对于这样的事情，你可能应该使用一个单独的/免费的全文搜索引擎，比如 Sphinx Search (google it)，它专门用于这类事情。

An alternative to that, is to have another table that contains keywords and the number of occurrences of those keywords in each string. This means you need to store each phrase you have (e.g. really really nice application) and also store the keywords in another table (i.e. really, 2, nice, 1, application, 1) and link that keyword table to your full-phrase table. This means that you would have to break up strings into keywords as they are entered into your database and store them in two places. This is a typical space vs speed trade-off.

另一种方法是使用另一个表，其中包含关键字以及这些关键字在每个字符串中的出现次数。这意味着您需要存储您拥有的每个短语（例如really really nice application），并将关键字存储在另一个表中（即really, 2，nice, 1，application, 1）并将该关键字表链接到您的完整短语表。这意味着您必须在输入数据库时将字符串分解为关键字并将它们存储在两个位置。这是典型的空间与速度权衡。

Answer 2

回答by Erwin Brandstetter

Prefix matching with Full Text Search

与全文搜索的前缀匹配

Revisiting this question some years later, it strikes me, that FTS doessupport prefix matching. Your query could work like this:

几年后重新审视这个问题，让我震惊的是，FTS确实支持前缀匹配。您的查询可以这样工作：

SELECT * FROM tbl
WHERE  to_tsvector('simple', string) @@ to_tsquery('simple', 'appl:*');

Note the appended :*in the tsquery. Details:

注意附加:*在tsquery. 细节：

Get partial match from GIN indexed TSVECTOR column

从 GIN 索引的 TSVECTOR 列中获取部分匹配

Alternative with regular expressions

正则表达式的替代

SELECT * FROM tbl
WHERE  string ~ '\mappl';

Quoting the manual here:

在这里引用手册：

\m.. matches only at the beginning of a word

\m.. 只匹配单词的开头

To order by the count of matches, you could use regexp_matches()

要按匹配数排序，您可以使用 regexp_matches()

SELECT tbl_id, count(*) AS matches
FROM  (
    SELECT tbl_id, regexp_matches(string, '\mappl', 'g')
    FROM   tbl
    WHERE  string ~ '\mappl'
    ) sub
GROUP  BY 1
ORDER  BY 2 DESC;

Or regexp_split_to_table():

或regexp_split_to_table()：

SELECT tbl_id, string, count(*) - 1 AS matches
FROM  (
    SELECT tbl_id, string, regexp_split_to_table(string, '\mappl')
    FROM   tbl
    WHERE  string ~ '\mappl'
    ) sub
GROUP  BY 1, 2
ORDER  BY 3 DESC, 2, 1;

SQL Fiddledemonstrating all three.

SQL Fiddle演示了所有三个。

Postgres 9.3 even offers index supportfor simple regular expressions, with a trigram GIN or GiST index (quoting the release notes):

Postgres 9.3 甚至为简单的正则表达式提供索引支持，带有三元组 GIN 或 GiST 索引（引用发行说明）：

Add support for indexing of regular-expression searches in pg_trgm (Alexander Korotkov)

在 pg_trgm (Alexander Korotkov) 中添加对正则表达式搜索索引的支持

Depesz wrote a blog about the new feature.

Depesz 写了一篇关于新功能的博客。

postgresql 搜索字符串中的单个单词

提问by Erwin Brandstetter

采纳答案by Hamza Kubba

回答by Erwin Brandstetter

Prefix matching with Full Text Search

与全文搜索的前缀匹配

Alternative with regular expressions

正则表达式的替代

相关推荐

最近更新

标签

postgresql 搜索字符串中的单个单词

提问by Erwin Brandstetter

采纳答案by Hamza Kubba

回答by Erwin Brandstetter

Prefix matching with Full Text Search

与全文搜索的前缀匹配

Alternative with regular expressions

正则表达式的替代

相关推荐

postgresql Rails 4 pgsql add_index 类型为 GIN 或 GiST

postgresql 在 PSQL 脚本中使用环境变量

如何使用 liquibase 为 postgreSQL 数据库创建触发器？

postgresql Postgres - 如何调试/跟踪“事务中空闲”连接

相关推荐

最近更新

标签