MySQL 如何允许在搜索查询中使用连字符进行全文搜索

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5192499/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 18:59:25  来源:igfitidea点击:

How to allow fulltext searching with hyphens in the search query

mysqlsearchspecial-charactersfull-text-searchhyphen

提问by Jay

I have keywords like "some-or-other" where the hyphens matter in the search through my mysql database. I'm currently using the fulltext function.

我有像“some-or-other”这样的关键字,其中连字符在通过我的 mysql 数据库进行搜索时很重要。我目前正在使用全文功能。

Is there a way to escape the hyphen character? I know that one option is to comment out #define HYPHEN_IS_DELIMin the myisam/ftdefs.hfile, but unfortunately my host does not allow this. Is there another option out there?

有没有办法逃避连字符?我知道一种选择是#define HYPHEN_IS_DELIMmyisam/ftdefs.h文件中注释掉,但不幸的是我的主机不允许这样做。还有其他选择吗?

Edit 3-8-11 Here's the code I have right now:

编辑 3-8-11 这是我现在拥有的代码:

$search_input = $_GET['search_input'];
$keyword_safe = mysql_real_escape_string($search_input);
$keyword_safe_fix = "*'\"" . $keyword_safe . "\"'*";


$sql = "
    SELECT *,
        MATCH(coln1, coln2, coln3) AGAINST('$keyword_safe_fix') AS score
        FROM table_name
    WHERE MATCH(coln1, coln2, coln3) AGAINST('$keyword_safe_fix')
    ORDER BY score DESC
";

回答by Yasen Zhelev

From here http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

从这里http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

One solution to find a word with a dashes or hyphens in is to use FULL TEXT SEARCH IN BOOLEAN MODE, and to enclose the word with the hyphen / dash in double quotes.

查找带有破折号或连字符的单词的一种解决方案是在布尔模式中使用全文搜索,并将带有连字符/破折号的单词用双引号括起来。

Or from here http://bugs.mysql.com/bug.php?id=2095

或者从这里http://bugs.mysql.com/bug.php?id=2095

There is another workaround. It was recently added to the manual: " Modify a character set file: This requires no recompilation. The true_word_char() macro uses a “character type” table to distinguish letters and numbers from other characters. . You can edit the contents in one of the character set XML files to specify that '-' is a “letter.” Then use the given character set for your FULLTEXT indexes. "

还有另一种解决方法。最近在手册中添加了:“修改字符集文件:这不需要重新编译。true_word_char() 宏使用“字符类型”表来区分字母和数字与其他字符。您可以在其中之一中编辑内容字符集 XML 文件以指定 '-' 是一个“字母”。然后为您的 FULLTEXT 索引使用给定的字符集。”

Have not tried it on my own.

自己没试过。

Edit: Here is some more additional info from here http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html

编辑:这里有更多附加信息http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html

A phrase that is enclosed within double quote (“"”) characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words and performs a search in the FULLTEXT index for the words. Prior to MySQL 5.0.3, the engine then performed a substring search for the phrase in the records that were found, so the match must include nonword characters in the phrase. As of MySQL 5.0.3, nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase" in MySQL 5.0.3, but not before.

包含在双引号 (“"”) 字符中的短语仅匹配字面上包含该短语的行,因为它是键入的。全文引擎将短语拆分为单词并在 FULLTEXT 索引中搜索单词。在 MySQL 5.0.3 之前,引擎然后对找到的记录中的短语执行子字符串搜索,因此匹配必须包含短语中的非单词字符。从 MySQL 5.0.3 开始,不需要完全匹配非单词字符:短语搜索只要求匹配包含与短语完全相同的单词并且顺序相同。例如,“测试短语”在 MySQL 5.0.3 中匹配“测试,短语”,但在之前不匹配。

If the phrase contains no words that are in the index, the result is empty. For example, if all words are either stopwords or shorter than the minimum length of indexed words, the result is empty.

如果短语不包含索引中的单词,则结果为空。例如,如果所有单词都是停用词或短于索引词的最小长度,则结果为空。

回答by mgutt

Some people would suggest to use the following query:

有些人会建议使用以下查询:

SELECT id 
FROM texts
WHERE MATCH(text) AGAINST('well-known' IN BOOLEAN MODE)
HAVING text LIKE '%well-known%';

But by that you need many variants depending on the used fulltext operators. Task: Realize a query like +well-known +(>35-hour <39-hour) working week*. Too complex!

但是,根据使用的全文运算符,您需要许多变体。任务:实现像+well-known +(>35-hour <39-hour) working week*. 太复杂了!

And do not forget the default len of ft_min_word_lenso a search for up-to-datereturns only datein your results.

并且不要忘记ft_min_word_lenso的默认 lenup-to-datedate在您的结果中搜索返回。

Trick

诡计

Because of that I prefer a trick so constructions with HAVINGetc aren't needed at all:

因此,我更喜欢一个技巧,因此HAVING根本不需要使用etc 的结构:

  1. Instead of adding the following text to your database table:

    "The Up-to-Date Sorcerer" is a well-known science fiction short story.
    copy the hyphen words without hypens to the end of the text inside a comment:
    "The Up-to-Date Sorcerer" is a well-known science fiction short story.<!-- UptoDate wellknown -->

  2. If the users searches for up-to-dateremove the hyphen in the sql query:
    MATCH(text) AGAINST('uptodate ' IN BOOLEAN MODE)

  1. 而不是将以下文本添加到您的数据库表中:

    "The Up-to-Date Sorcerer" is a well-known science fiction short story.
    将不带连字符的连字符复制到评论内的文本末尾:
    "The Up-to-Date Sorcerer" is a well-known science fiction short story.<!-- UptoDate wellknown -->

  2. 如果用户搜索up-to-date删除 sql 查询中的连字符:
    MATCH(text) AGAINST('uptodate ' IN BOOLEAN MODE)

By that you're user can find up-to-dateas one word instead of getting all results that contain only date(because ft_min_word_lenkills upand to).

通过这种方式,您的用户可以找到up-to-date一个单词,而不是获取仅包含date(因为ft_min_word_lenkillsupto)的所有结果。

Of course before you echothe texts you should remove the <!-- ... -->comments.

当然,在您阅读echo文本之前,您应该删除<!-- ... -->评论。

Advantages

好处

  • the query is simpler
  • the user is able to use all fulltext operators as usual
  • the query is faster.
  • If a user searches for -well-known +scienceMySQL treats that as not include *well*, could include *known* and must include *science*. This isn't what the user expected. The trick solves that, too (as the sql query searches for -wellknown +science)
  • 查询更简单
  • 用户可以像往常一样使用所有全文运算符
  • 查询速度更快。
  • 如果用户搜索-well-known +scienceMySQL 将其视为not include *well*, could include *known* and must include *science*. 这不是用户所期望的。这个技巧也解决了这个问题(当 sql 查询搜索时-wellknown +science

回答by Hutcho

Maybe simpler to use the Binaryoperator.

也许使用Binary运算符更简单。

SELECT * 
FROM your_table_name 
WHERE BINARY your_column = BINARY "Foo-Bar%AFK+LOL"

http://dev.mysql.com/doc/refman/5.0/en/cast-functions.html#operator_binary

http://dev.mysql.com/doc/refman/5.0/en/cast-functions.html#operator_binary

The BINARYoperator casts the string following it to a binary string. This is an easy way to force a column comparison to be done byte by byte rather than character by character. This causes the comparison to be case sensitive even if the column is not defined as BINARYor BLOB. BINARYalso causes trailing spaces to be significant.

BINARY运营商蒙上它下面的二进制字符串的字符串。这是强制逐字节而不是逐字符进行列比较的简单方法。即使列未定义为BINARYor ,这也会导致比较区分大小写BLOBBINARY也会导致尾随空格很重要。

回答by Félix Gagnon-Grenier

This might sound off, but after struggling with this for a while, I realised I get the results I wish for by removing the hyphen from the search expression. For instance, if I search for 'word-separated'

这听起来可能有些不对劲,但是在为此苦苦挣扎了一段时间后,我意识到通过从搜索表达式中删除连字符来获得我想要的结果。例如,如果我搜索 'word-separated'

SELECT * FROM table WHERE MATCH(column) AGAINST ('word separated');

returns instances of 'word-separated' as needed. This also returns otherinstances of separated and word, but adding the +operator to each word achieves the hyphen search.

根据需要返回 'word-separated' 的实例。这也返回了分隔和单词的其他实例,但将+运算符添加到每个单词实现了连字符搜索。

SELECT * FROM table WHERE MATCH(column) AGAINST ('+word +separated');

回答by Codemonkey

My preferred solution to this is to remove the hyphen from the search term and from the data being searched. I keep two columns in my full-text table - searchand return. searchcontains sanitised data with various characters removed, and is what the users' search terms are compared to, after my code has sanitised those as well.

我对此的首选解决方案是从搜索词和正在搜索的数据中删除连字符。我在全文表中保留两列 -searchreturn. search包含删除了各种字符的经过清理的数据,并且在我的代码也清理了用户的搜索词之后,这些数据将与用户的搜索词进行比较。

Then I display the returncolumn.

然后我显示return列。

It does mean I have two copies of the data in my database, but for me that trade-off is well worth it. My FT table is only ~500k rows, so it's not a big deal in my use case.

这确实意味着我的数据库中有两个数据副本,但对我来说,这种权衡是值得的。我的 FT 表只有大约 50 万行,所以在我的用例中这没什么大不了的。