MySQL match() against() - 按相关性和列排序?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6259647/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 20:08:48  来源:igfitidea点击:

MySQL match() against() - order by relevance and column?

mysqlfull-text-search

提问by Kristoffer la Cour

Okay, so I'm trying to make a full text search in multiple columns, something simple like this:

好的,所以我正在尝试在多列中进行全文搜索,如下所示:

SELECT * FROM pages WHERE MATCH(head, body) AGAINST('some words' IN BOOLEAN MODE)

Now i want to order by relevance, (how many of the words are found?) which I have been able to do with something like this:

现在我想按相关性排序,(找到多少个词?)我已经能够用这样的东西来做:

SELECT * , MATCH (head, body) AGAINST ('some words' IN BOOLEAN MODE) AS relevance 
FROM pages
WHERE MATCH (head, body) AGAINST ('some words' IN BOOLEAN MODE)
ORDER BY relevance

Now here comes the part where I get lost, I want to prioritize the relevance in the headcolumn.

现在到了我迷路的部分,我想优先考虑head专栏中的相关性。

I guess I could make two relevance columns, one for headand one for body, but at that point I'd be doing somewhat the same search in the table three times, and for what i'm making this function, performance is important, since the query will both be joined and matched against other tables.

我想我可以创建两个相关性列,一个 forhead和一个 for body,但那时我会在表中进行三次相同的搜索,对于我制作这个函数的内容,性能很重要,因为查询将连接并与其他表匹配。

So, my main question is, is there a faster way to search for relevance and prioritize certain columns? (And as a bonus possibly even making relevance count number of times the words occur in the columns?)

所以,我的主要问题是,是否有更快的方法来搜索相关性并确定某些列的优先级?(作为奖励,甚至可能使相关性计数单词在列中出现的次数?)

Any suggestions or advice would be great.

任何建议或建议都会很棒。

Note:I will be running this on a LAMP-server. (WAMP in local testing)

注意:我将在 LAMP 服务器上运行它。(本地测试中的 WAMP)

回答by Denis de Bernardy

This mightgive the increased relevance to the head part that you want. It won't double it, but it might possibly good enough for your sake:

可能会增加与您想要的头部的相关性。它不会翻倍,但它可能对你来说已经足够了:

SELECT pages.*,
       MATCH (head, body) AGAINST ('some words') AS relevance,
       MATCH (head) AGAINST ('some words') AS title_relevance
FROM pages
WHERE MATCH (head, body) AGAINST ('some words')
ORDER BY title_relevance DESC, relevance DESC

-- alternatively:
ORDER BY title_relevance + relevance DESC

An alternative that you also want to investigate, if you've the flexibility to switch DB engine, is Postgres. It allows to set the weight of operators and to play around with the ranking.

如果您可以灵活地切换数据库引擎,您还想研究的另一种方法是Postgres。它允许设置运营商的权重并进行排名。

回答by Camilla

Just adding for who might need.. Don't forget to alter the table!

只是为可能需要的人添加.. 不要忘记改变表格!

ALTER TABLE table_name ADD FULLTEXT(column_name);

回答by jisaacstone

I have never done so, but it seems like

我从来没有这样做过,但似乎

MATCH (head, head, body) AGAINST ('some words' IN BOOLEAN MODE)

Shouldgive a double weight to matches found in the head.

应该对在头部发现的匹配给予双重权重。



Just read this comment on the docs page, Thought it might be of value to you:

只需阅读文档页面上的此评论,认为它可能对您有价值:

Posted by Patrick O'Lone on December 9 2002 6:51am

It should be noted in the documentation that IN BOOLEAN MODE will almost always return a relevance of 1.0. In order to get a relevance that is meaningful, you'll need to:

帕特里克·奥隆 (Patrick O'Lone) 于 2002 年 12 月 9 日上午 6:51 发表

应该在文档中注意到 IN BOOLEAN MODE 几乎总是返回 1.0 的相关性。为了获得有意义的相关性,您需要:

SELECT MATCH('Content') AGAINST ('keyword1 keyword2') as Relevance 
FROM table 
WHERE MATCH ('Content') AGAINST('+keyword1+keyword2' IN BOOLEAN MODE) 
HAVING Relevance > 0.2 
ORDER BY Relevance DESC 

Notice that you are doing a regular relevance query to obtain relevance factors combined with a WHERE clause that uses BOOLEAN MODE. The BOOLEAN MODE gives you the subset that fulfills the requirements of the BOOLEAN search, the relevance query fulfills the relevance factor, and the HAVING clause (in this case) ensures that the document is relevant to the search (i.e. documents that score less than 0.2 are considered irrelevant). This also allows you to order by relevance.

This may or may not be a bug in the way that IN BOOLEAN MODE operates, although the comments I've read on the mailing list suggest that IN BOOLEAN MODE's relevance ranking is not very complicated, thus lending itself poorly for actually providing relevant documents. BTW - I didn't notice a performance loss for doing this, since it appears MySQL only performs the FULLTEXT search once, even though the two MATCH clauses are different. Use EXPLAIN to prove this.

请注意,您正在执行常规相关性查询以获取与使用 BOOLEAN MODE 的 WHERE 子句结合的相关性因素。BOOLEAN MODE 为您提供满足 BOOLEAN 搜索要求的子集,相关性查询满足相关性因子,并且 HAVING 子句(在这种情况下)确保文档与搜索相关(即得分小于 0.2 的文档)被认为无关紧要)。这也允许您按相关性排序。

这可能是也可能不是 IN BOOLEAN MODE 操作方式中的错误,尽管我在邮件列表上读到的评论表明 IN BOOLEAN MODE 的相关性排名不是很复杂,因此不太适合实际提供相关文档。顺便说一句 - 我没有注意到这样做的性能损失,因为即使两个 MATCH 子句不同,MySQL 似乎也只执行一次 FULLTEXT 搜索。使用 EXPLAIN 来证明这一点。

So it would seem you may not need to worry about calling the fulltext search twice, though you still should "use EXPLAIN to prove this"

所以看起来你可能不需要担心两次调用全文搜索,尽管你仍然应该“使用 EXPLAIN 来证明这一点”

回答by Noah King

I was just playing around with this, too. One way you can add extra weight is in the ORDER BY area of the code.

我也只是在玩这个。添加额外权重的一种方法是在代码的 ORDER BY 区域中。

For example, if you were matching 3 different columns and wanted to more heavily weight certain columns:

例如,如果您匹配 3 个不同的列并希望对某些列进行更重的加权:

SELECT search.*,
MATCH (name) AGAINST ('black' IN BOOLEAN MODE) AS name_match,
MATCH (keywords) AGAINST ('black' IN BOOLEAN MODE) AS keyword_match,
MATCH (description) AGAINST ('black' IN BOOLEAN MODE) AS description_match
FROM search
WHERE MATCH (name, keywords, description) AGAINST ('black' IN BOOLEAN MODE)
ORDER BY (name_match * 3  + keyword_match * 2  + description_match) DESC LIMIT 0,100;