编辑:MySQL + PHP
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4671378/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Levenshtein: MySQL + PHP
提问by rik
$word = strtolower($_GET['term']);
$lev = 0;
$q = mysql_query("SELECT `term` FROM `words`");
while($r = mysql_fetch_assoc($q))
{
$r['term'] = strtolower($r['term']);
$lev = levenshtein($word, $r['term']);
if($lev >= 0 && $lev < 5)
{
$word = $r['term'];
}
}
How can I move all that into just one query? Don't want to have to query through all terms and do the filtering in PHP.
我怎样才能将所有这些都移动到一个查询中?不想查询所有术语并在 PHP 中进行过滤。
回答by rik
You need a levenshtein function in MySQLand query like
您需要在 MySQL 中使用levenshtein 函数并进行查询,例如
$word = mysql_real_escape_string($word);
mysql_qery("SELECT `term` FROM `words` WHERE levenshtein('$word', `term`) BETWEEN 0 AND 4");
回答by Ian Atkin
There are two ways to implement a Levenshtein function in MySQL. The first is to create a STORED FUNCTION which operates much like a STORED TRANSACTION, except it has distinct inputs and an output. This is fine for small datasets, but a little slow on anything approaching several thousand rows.
在 MySQL 中有两种实现 Levenshtein 函数的方法。第一个是创建一个 STORED FUNCTION,它的操作与 STORED TRANSACTION 非常相似,只是它具有不同的输入和输出。这对于小型数据集来说很好,但对于接近数千行的任何数据都会有点慢。
CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) )
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
DECLARE s1_char CHAR;
-- max strlen=255
DECLARE cv0, cv1 VARBINARY(256);
SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
IF s1 = s2 THEN
RETURN 0;
ELSEIF s1_len = 0 THEN
RETURN s2_len;
ELSEIF s2_len = 0 THEN
RETURN s1_len;
ELSE
WHILE j <= s2_len DO
SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
END WHILE;
WHILE i <= s1_len DO
SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
WHILE j <= s2_len DO
SET c = c + 1;
IF s1_char = SUBSTRING(s2, j, 1) THEN
SET cost = 0; ELSE SET cost = 1;
END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
IF c > c_temp THEN SET c = c_temp; END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
IF c > c_temp THEN
SET c = c_temp;
END IF;
SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
END WHILE;
SET cv1 = cv0, i = i + 1;
END WHILE;
END IF;
RETURN c;
END//
Store the above code in a .sql file and import it into your database like so:
将上述代码存储在 .sql 文件中,然后将其导入数据库,如下所示:
source /tmp/mysql_udf.sql
The second method is to implement a User Defined Function in C/C++ and link it into MySQL as a shared library (*.so file). This method also uses a STORED FUNCTION to call the library, which means the actual query for this or the first method may be identical (providing the inputs to both functions are the same). You can find out more about this method here: http://samjlevy.com/mysql-levenshtein-and-damerau-levenshtein-udfs/
第二种方法是在 C/C++ 中实现用户定义的函数,并将其作为共享库(*.so 文件)链接到 MySQL。此方法还使用 STORED FUNCTION 来调用库,这意味着对该方法或第一种方法的实际查询可能相同(提供两个函数的输入相同)。您可以在此处找到有关此方法的更多信息:http: //samjlevy.com/mysql-levenshtein-and-damerau-levenshtein-udfs/
With either of these methods, your query would be something like:
使用这两种方法中的任何一种,您的查询将类似于:
SELECT term FROM words WHERE levenshtein(term, 'term') < 5;
Also, remember that the 'threshold' value should change in relation to the original word length. It's better to think of it in terms of a percentage value, i.e. half your word = 50%, half of 'term' = 2.
另外,请记住,“阈值”值应该相对于原始字长而变化。最好用百分比值来考虑它,即你的单词的一半 = 50%,'term' 的一半 = 2。
回答by carlosvini
If you have a huge database, you can filter the words first using SOUNDEX:
如果你有一个庞大的数据库,你可以先使用 SOUNDEX 过滤单词:
$word = strtolower(mysql_real_escape_string($_GET['term']));
$rs = mysql_query("SELECT LOWER(`term`) FROM `words` WHERE SOUNDEX(term) = SOUNDEX(" . $word . ")");
while ($row = mysql_fetch_assoc($rs)) {
$lev = levenshtein($word, $row['term']);
....
}
If you have time enough to play with a C extension or procedure, you may achieve better performance, but filtering the records on mysql before applying real levenshtein will make things faster with almost no effort.
如果您有足够的时间来玩 C 扩展或程序,您可能会获得更好的性能,但是在应用真正的 levenshtein 之前过滤 mysql 上的记录将使事情变得更快,几乎不费吹灰之力。
回答by John Rausch
If you are dealing with very large data sets I have found that it is much more efficient to handle the Levenshtein operations and sorting in PHP than it is in MySQL. e.g. query of about 1000 records:
如果您正在处理非常大的数据集,我发现在 PHP 中处理 Levenshtein 操作和排序比在 MySQL 中更有效。例如查询大约 1000 条记录:
MySQL( ~ 0.0050s) -> PHP Levenshtein( ~ 1.300s)
MySQL(~0.0050s) -> PHP Levenshtein(~1.300s)
vs.
对比
MySQL Levenshtein( >= 5.000s) -> PHP( ~ 0.250s)
MySQL Levenshtein(>= 5.000s) -> PHP( ~ 0.250s)
There are also many other options for optimizing search engines but if you want to use Levenshtein just be aware of the data you'll be handling and the latencies you want.
还有许多其他选项可用于优化搜索引擎,但如果您想使用 Levenshtein,请注意您将处理的数据和您想要的延迟。
回答by Coder55
I suggest you including the call of the levenshtein(link: http://www.artfulsoftware.com/infotree/queries.php#552) into your query.
我建议您将 levenshtein(链接:http: //www.artfulsoftware.com/infotree/queries.php#552)的调用包含在 您的查询中。
You should use mysqli_query($q) because mysql_query($q) is deprecated and may be removed in future php versions!
您应该使用 mysqli_query($q) 因为 mysql_query($q) 已被弃用,并且可能会在未来的 php 版本中删除!
$word = mysql_real_escape_string($word);
$query = "SELECT `term` FROM `words` WHERE levenshtein('$word', `term`) BETWEEN 0 AND 4";
mysqli_qery($query);
回答by mvbl fst
You can make this code look a bit neater but @profitphp is right, you can't doing it in MySQL without a levenstein library.
你可以让这段代码看起来更整洁一点,但@profitphp 是对的,你不能在没有 levenstein 库的情况下在 MySQL 中做到这一点。
$word = strtolower($_GET['term']); $q = mysql_uqery("SELECT LOWER(`term`) FROM `words`"); while($r = mysql_fetch_assoc($q)) { $lev = levenshtein($word, $r['term']); .... }
回答by Randy
I do this in Oracle by implementing the algorithm in PL/SQL inside a function that can be called.
我在 Oracle 中通过在可调用的函数内在 PL/SQL 中实现算法来做到这一点。
回答by profitphp
That is one query. If you're asking if you can move the levenshtein functionality to mysql, you can't.
那是一个查询。如果您问是否可以将 levenshtein 功能移至 mysql,则不能。
Ok, well you can, but its not any easier than just doing it in php.
好吧,你可以,但它并不比在 php 中做更容易。
http://www.artfulsoftware.com/infotree/queries.php?&bw=1280#552
http://www.artfulsoftware.com/infotree/queries.php?&bw=1280#552