在 SQL Server 上逐字使用 SOUNDEX()

Question

提问by Charles

Here is my problem. For example I have a table Productsthat contains a field, Name:

这是我的问题。例如，我有一个包含字段Name的表Products：

Products
ID | Name | ..
1  | "USB Key 10Go"
2  | "Intel computer"
3  | "12 inches laptop computer"
...

I'm currently implementing a simple search engine (SQL Server and ASP .NET, C#) for an iPhone web-app and I would like to use the SOUNDEX()SQL Server function.

我目前正在为 iPhone 网络应用程序实现一个简单的搜索引擎（SQL Server 和 ASP .NET、C#），我想使用SOUNDEX()SQL Server 功能。

The thing is, I can't directly use SOUNDEXon the Namefield. (This would be irrelevant since there are several words in the name.) I would like to apply the SOUNDEXfunction to each word from the Namefield, and then see if any of them matches the researched keyword.

问题是，我不能直接SOUNDEX在Name字段上使用。（这无关紧要，因为名称中有多个单词。）我想将该SOUNDEX函数应用于Name字段中的每个单词，然后查看它们中是否有任何一个与研究的关键字匹配。

If someone has any clue how to do this, that would be awesome.

如果有人知道如何做到这一点，那就太棒了。

Answer 1

采纳答案by Gabriel McAdams

Have you looked into the Full-Text Search feature in SQL Server? I know this is not exactly what you asked for. Its just that the SOUNDEX() function is used to find similar SOUNDING names (EX: SMITH and SMYTHE sound the same). In a search engine, however, how a word sounds is less important than the search words themselves. Full-Text Search also lets you use synonyms (allowing you to specify certain words that mean the same thing within your application's context), and have them automatically considered during your search.

您是否研究过 SQL Server 中的全文搜索功能？我知道这不是你所要求的。只是 SOUNDEX() 函数用于查找相似的 SOUNDING 名称（例如：SMITH 和 SMYTHE 听起来相同）。然而，在搜索引擎中，一个词的发音不如搜索词本身重要。全文搜索还允许您使用同义词（允许您在应用程序上下文中指定某些具有相同含义的词），并在搜索过程中自动考虑它们。

Look at these pages for more information about Full Text Search in SQL Server:

有关 SQL Server 中全文搜索的详细信息，请查看这些页面：

Introduction to Full-Text Search

Answer 2

回答by Bob Jarvis - Reinstate Monica

Rather than use Soundex you might be better off computing the Levenshtein distance between the two strings. See the Wikipedia article on Levenshtein distance.

与使用 Soundex 相比，您最好计算两个字符串之间的 Levenshtein 距离。请参阅维基百科关于 Levenshtein 距离的文章。

There's a TSQL implementation of the Levenshtein distance algorithm here.

有一个TSQL实现Levenshtein距离算法在这里。

Share and enjoy.

分享和享受。

EDIT 03-May-2012

编辑 2012 年 5 月 3 日

Since writing my original response I've learned that Oracle includes the Levenshtein distance and several other "string similarity" functions in the UTL_MATCH package, which I believe is a standard part of the database. Documentation here. Perhaps not directly related to the original post (which was for SQL Server) but perhaps useful as many shops use multiple databases.

自从撰写我的原始回复后，我了解到 Oracle 在 UTL_MATCH 包中包含 Levenshtein 距离和其他几个“字符串相似性”函数，我认为这是数据库的标准部分。文档在这里。也许与原始帖子（针对 SQL Server）没有直接关系，但可能有用，因为许多商店使用多个数据库。

Answer 3

回答by micahwittman

If you have to do it all in the RDBMS, a UDFwould be the best if it's an option.

如果您必须在 RDBMS 中完成所有操作，那么UDF将是最好的选择。

Otherwise, you could use this technique to at least soundex the first four words individually using PARSENAME:

否则，您可以使用此技术至少使用PARSENAME分别对前四个单词进行 soundex ：

From How do I split a string so I can access item x?:

从如何拆分字符串以便我可以访问项目 x？：

PARSENAME(REPLACE('12 inches laptop computer', ' ', '.'), 1)  --return computer
PARSENAME(REPLACE('12 inches laptop computer', ' ', '.'), 2)  --return laptop
...

However: using PARSENAME in this way is a hack and a serious limitation is it only works for a max of 4 parts. If there are 5 or more words PARSENAME will return NULL, so you have to check for that with a conditional and degrade gracefully.

但是：以这种方式使用 PARSENAME 是一种黑客行为，一个严重的限制是它最多只能用于 4 个部分。如果有 5 个或更多单词 PARSENAME 将返回 NULL，因此您必须使用条件检查并优雅地降级。

Here's a simplified example (again, without the NULL checks)

这是一个简化的示例（同样，没有 NULL 检查）

SELECT *
FROM Products 
WHERE SOUNDEX(search_input) = SOUNDEX(PARSENAME(REPLACE(Name, ' ', '.'), 4))
  OR SOUNDEX(search_input) = SOUNDEX(PARSENAME(REPLACE(Name, ' ', '.'), 3))
  OR SOUNDEX(search_input) = SOUNDEX(PARSENAME(REPLACE(Name, ' ', '.'), 2))
  OR SOUNDEX(search_input) = SOUNDEX(PARSENAME(REPLACE(Name, ' ', '.'), 1))

Answer 4

回答by Frunsi

The SOUNDEXmay be a perfect fit for your purpuse, but please remember that it may not provide good results for anything else than british or american english spoken words! It may even be used on german phonetical spoken words, but it will not work well with ANYTHING else.

这SOUNDEX可能非常适合您的需求，但请记住，除了英国或美国英语口语之外，它可能不会为其他任何事物提供良好的结果！它甚至可以用于德语语音口语，但不适用于其他任何东西。

Answer 5

回答by mike

You could try storing the metaphone of each word concatenated with hyphens. EG stored_metaphonefield could contain something like '-AKTRF-SPLS-'. Then build a query like this:

您可以尝试存储与连字符连接的每个单词的变音。EGstored_metaphone字段可能包含类似“-AKTRF-SPLS-”的内容。然后构建一个这样的查询：

$where = '(';
$search_sql = array();
$search_terms = explode(' ',$search);
foreach ($search_terms as $term) {
    $search_sql[] = "`stored_metaphone` LIKE '%-".metaphone($term)."-%'";
}
$where .= implode(' OR ',$search_sql);
$where .= ')';

NB this is only the WHERE part of the query.

注意这只是查询的 WHERE 部分。

As far as I know metaphone only works with English. The above sql is working rather well on a number of sites.

据我所知，metaphone 只适用于英语。上面的 sql 在许多站点上运行良好。

在 SQL Server 上逐字使用 SOUNDEX()

提问by Charles

采纳答案by Gabriel McAdams

回答by Bob Jarvis - Reinstate Monica

回答by micahwittman

回答by Frunsi

回答by mike

相关推荐

最近更新

标签

在 SQL Server 上逐字使用 SOUNDEX()

提问by Charles

采纳答案by Gabriel McAdams

回答by Bob Jarvis - Reinstate Monica

回答by micahwittman

回答by Frunsi

回答by mike

相关推荐

SQL 如何在sql中使用多个值

SQL 如何使用 XQuery 将 xml 插入另一个 xml 中的节点？

SQL SQLite 是否支持复制？

SQL 使用 ROW_NUMBER 和 PARTITION BY 获取第一行和最后一行

相关推荐

最近更新

标签