postgresql 匹配以前缀结尾的短语与全文搜索

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6155592/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-20 23:02:45  来源:igfitidea点击:

Match a phrase ending in a prefix with full text search

postgresqlfull-text-searchpattern-matchingtsvector

提问by itsame69

I'm looking for a way to emulate something like SELECT * FROM table WHERE attr LIKE '%text%'using a tsvector in PostgreSQL.

我正在寻找一种方法来模拟SELECT * FROM table WHERE attr LIKE '%text%'在 PostgreSQL 中使用 tsvector 之类的东西。

I've created a tsvector attribute without using a dictionary. Now, a query like ...

我在不使用字典的情况下创建了一个 tsvector 属性。现在,像这样的查询...

SELECT title
FROM table
WHERE title_tsv @@ plainto_tsquery('ph:*');  

... would return all titles like 'Physics', 'PHP', etc. But how can I create a query that returns all records where the title start with 'Zend Fram' (which should return for instance 'Zend Framework')?

... 将返回所有标题,如“Physics”、“PHP”等。但是我如何创建一个查询来返回标题以“Zend Fram”开头的所有记录(例如应该返回“Zend Framework”)?

Of course, I could use something like:

当然,我可以使用类似的东西:

SELECT title
FROM table
WHERE title_tsv @@ to_tsquery('zend')
AND   title_tsv @@ to_tsquery('fram:*');

However, this seems a little awkward.

然而,这似乎有点尴尬。

So, the question is: is there a way to formulate the query given above using something like:

所以,问题是:有没有办法使用以下内容来制定上面给出的查询:

SELECT title
FROM table
WHERE title_tsv @@ to_tsquery('zend fram:*');

回答by Seth Robertson

SELECT title
FROM table
WHERE title_tsv @@ to_tsquery('zend') and
title_tsv @@ to_tsquery('fram:*')  

is equivalent to:

相当于:

SELECT title
FROM table
WHERE title_tsv @@ to_tsquery('zend & fram:*')

but of course that finds "Zend has no framework" as well.

但当然也发现“Zend 没有框架”。

You could of course express a regular expression match against title after the tsquery match, but you would have to use explain analyze to make sure that was being executed after the tsquery instead of before.

您当然可以在 tsquery 匹配之后表达针对标题的正则表达式匹配,但是您必须使用解释分析来确保在 tsquery 之后而不是之前执行。

回答by Erwin Brandstetter

Postgres 9.6introduces phrase search capabilities for full text search. So this works now:

Postgres 9.6为全文搜索引入了短语搜索功能。所以这现在有效:

SELECT title
FROM  tbl
WHERE title_tsv @@ to_tsquery('zend <-> fram:*');

<->being the FOLLOWED BY operator.

<->作为 FOLLOWED BY 运算符。

It finds 'foo Zend framework bar'or 'Zend frames', but not'foo Zend has no framework bar'.

它找到'foo Zend framework bar''Zend frames',而不是'foo Zend has no framework bar'

Quoting the release notes for Postgres 9.6:

引用Postgres 9.6发行说明:

A phrase-search query can be specified in tsquery input using the new operators <->and <N>. The former means that the lexemes before and after it must appear adjacent to each other in that order. The latter means they must be exactly Nlexemes apart.

可以使用新运算符<->和在 tsquery 输入中指定短语搜索查询。前者意味着它前后的词素必须以该顺序彼此相邻出现。后者意味着它们必须完全分开。<N>N

For best performance support the query with a GIN index:

为了获得最佳性能,请使用 GIN 索引支持查询:

CREATE INDEX tbl_title_tsv_idx ON tbl USING GIN (title_tsv);

Or don't store title_tsvin the table at all (bloating it and complicating writes). You can use an expression index instead:

或者根本不存储title_tsv在表中(使其膨胀并使写入复杂化)。您可以改用表达式索引:

CREATE INDEX tbl_title_tsv_idx ON tbl USING GIN (to_tsvector('english', title));

You need to specify the text search configuration (often language-specific) to make the expression immutable. And adapt the query accordingly:

您需要指定文本搜索配置(通常是特定于语言的)以使表达式不可变。并相应地调整查询:

...
WHERE to_tsvector('english', title) @@ to_tsquery('english', 'zend <-> fram:*');

回答by mgamba

Not a pretty solution, but it should do the job:

不是一个很好的解决方案,但它应该可以完成这项工作:

psql=# SELECT regexp_replace(cast(plainto_tsquery('Zend Fram') as text), E'(\'\w+\')', E'\1:*', 'g') ;
   regexp_replace    
---------------------
 'zend':* & 'fram':*
(1 row)

It can be used like:

它可以像这样使用:

psql=# SELECT title FROM table WHERE title_tsv(title) @@ to_tsquery(regexp_replace(cast(plainto_tsquery('Zend Fram') as text), E'(\'\w+\')', E'\1:*', 'g'));

How this works:

这是如何工作的:

  1. casts the plain tsquery to a string: cast(plainto_tsquery('Zend Fram') as text)
  2. uses regex to append the :*prefix matcher to each search term: regexp_replace(..., E'(\'\\w+\')', E'\\1:*', 'g')
  3. converts it back to a non-plain tsquery. to_tsquery(...)
  4. and uses it in the search expression SELECT title FROM table WHERE title_tsv(title) @@ ...
  1. 将普通 tsquery 转换为字符串: cast(plainto_tsquery('Zend Fram') as text)
  2. 使用正则表达式将:*前缀匹配器附加到每个搜索词:regexp_replace(..., E'(\'\\w+\')', E'\\1:*', 'g')
  3. 将其转换回非普通的 tsquery。 to_tsquery(...)
  4. 并在搜索表达式中使用它 SELECT title FROM table WHERE title_tsv(title) @@ ...

回答by Tometzky

There's a way to do it in Postgres using trigramsand Gin/Gist indexes. There's a simple example, but with some rough edges, in this article by Kristo Kaiv: Substring Search.

有一种方法可以在 Postgres 中使用trigrams和 Gin/Gist 索引来做到这一点。在 Kristo Kaiv 的这篇文章中,有一个简单的例子,但有一些粗糙的边缘:子字符串搜索