在 VBA 中查找发音相似的文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1607690/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Finding similar sounding text in VBA
提问by Ed Lee
My manager tells me that there is a way to evaluate names that are spelled differently but sound similar in the way they are pronounced. Ideally, we want to be able to evaluate a user-entered search name and return exact matches as well as "similar sounding" names. He called the process "Soundits" but I cannot find any info on Google.
我的经理告诉我,有一种方法可以评估拼写不同但发音相似的名字。理想情况下,我们希望能够评估用户输入的搜索名称并返回完全匹配以及“发音相似”的名称。他称这个过程为“Soundits”,但我在谷歌上找不到任何信息。
Does this exist? Does anyone know if it is available for VBA (Access)?
这存在吗?有谁知道它是否可用于 VBA (Access)?
回答by Lawrence P. Kelley
Nice question! You're question includes a great example of the idea itself.
好问题!你的问题包括这个想法本身的一个很好的例子。
There is an algorithm called the Russell Soundexalgorithm, a standard technique in many applications, that evaluates names by the phonetic rather than the actual spelling. In this question, Sounditsand Soundexare similar sounding names! [EDIT: Just ran the Soundex. Soundits=S532 and Soundex=S532.]
有一种称为 Russell Soundex算法的算法,这是许多应用程序中的标准技术,它通过语音而不是实际拼写来评估名称。在这个问题中,Soundits和Soundex是发音相似的名字![编辑:刚刚运行了 Soundex。Soundits=S532 和 Soundex=S532。]
About Soundex:
关于 Soundex:
The Soundex algorithm is predicated on characteristics of English such as:
Soundex 算法基于英语的特征,例如:
- The first letter has high significance
- Many consonants sound similar
- Consonants affect pronunciation more than vowels
- 第一个字母意义重大
- 许多辅音发音相似
- 辅音比元音更能影响发音
One warning: Soundex was designed for names. The shorter the better. As a name grows longer, the Soundex becomes less reliable.
一个警告:Soundex 是为名称而设计的。越短越好。随着名称变长,Soundex 变得不那么可靠。
Resources:
资源:
- Here is an example that uses VBA for Access.
- There is a write-up on Soundex in the VBA Developer's Handbook, 2nd Editionby Ken Getz and Mike Gilbert.
- There is a lot of information about Soundex and other variants such as Soundex2 (Search for 'Soundex' and 'VBA').
- 这是一个将 VBA 用于Access的示例。
- Ken Getz 和 Mike Gilbert在VBA Developer's Handbook, 2nd Edition 中有一篇关于 Soundex 的文章。
- 有很多关于 Soundex 和其他变体的信息,例如 Soundex2(搜索“Soundex”和“VBA”)。
Code Example:
代码示例:
Below is some VBA code, found via a quick web search, that implements a variation of the Soundex algorithm.
下面是通过快速网络搜索找到的一些 VBA 代码,它实现了 Soundex 算法的变体。
Option Compare Database
Option Explicit
Public Function Soundex(varText As Variant) As Variant
On Error GoTo Err_Handler
Dim strSource As String
Dim strOut As String
Dim strValue As String
Dim strPriorValue As String
Dim lngPos As Long
If Not IsError(varText) Then
strSource = Trim$(Nz(varText, vbNullString))
If strSource <> vbNullString Then
strOut = Left$(strSource, 1&)
strPriorValue = SoundexValue(strOut)
lngPos = 2&
Do
strValue = SoundexValue(Mid$(strSource, lngPos, 1&))
If ((strValue <> strPriorValue) And (strValue <> vbNullString)) Or (strValue = "0") Then
strOut = strOut & strValue
strPriorValue = strValue
End If
lngPos = lngPos + 1&
Loop Until Len(strOut) >= 4&
End If
End If
If strOut <> vbNullString Then
Soundex = strOut
Else
Soundex = Null
End If
Exit_Handler:
Exit Function
Err_Handler:
MsgBox "Error " & Err.Number & ": " & Err.Description, vbExclamation, "Soundex()"
Resume Exit_Handler
End Function
Private Function SoundexValue(strChar As String) As String
Select Case strChar
Case "B", "F", "P", "V"
SoundexValue = "1"
Case "C", "G", "J", "K", "Q", "S", "X", "Z"
SoundexValue = "2"
Case "D", "T"
SoundexValue = "3"
Case "L"
SoundexValue = "4"
Case "M", "N"
SoundexValue = "5"
Case "R"
SoundexValue = "6"
Case vbNullString
SoundexValue = "0"
Case Else
'Return nothing for "A", "E", "H", "I", "O", "U", "W", "Y", non-alpha.
End Select
End Function
Levenshtein distance
莱文斯坦距离
Another method of comparing strings is to get the Levenshtein distance. Here is the example given in VBA, it is taken from LessThanDot Wiki:
另一种比较字符串的方法是获取Levenshtein distance。这是 VBA 中给出的示例,它取自LessThanDot Wiki:
Function LevenshteinDistance(word1, word2)
Dim s As Variant
Dim t As Variant
Dim d As Variant
Dim m, n
Dim i, j, k
Dim a(2), r
Dim cost
m = Len(word1)
n = Len(word2)
''This is the only way to use
''variables to dimension an array
ReDim s(m)
ReDim t(n)
ReDim d(m, n)
For i = 1 To m
s(i) = Mid(word1, i, 1)
Next
For i = 1 To n
t(i) = Mid(word2, i, 1)
Next
For i = 0 To m
d(i, 0) = i
Next
For j = 0 To n
d(0, j) = j
Next
For i = 1 To m
For j = 1 To n
If s(i) = t(j) Then
cost = 0
Else
cost = 1
End If
a(0) = d(i - 1, j) + 1 '' deletion
a(1) = d(i, j - 1) + 1 '' insertion
a(2) = d(i - 1, j - 1) + cost '' substitution
r = a(0)
For k = 1 To UBound(a)
If a(k) < r Then r = a(k)
Next
d(i, j) = r
Next
Next
LevenshteinDistance = d(m, n)
End Function
回答by ewall
Here are a couple working examples of the SOUNDEX algorithmin VBA:
以下是VBA 中SOUNDEX 算法的几个工作示例:
回答by David-W-Fenton
In addition to Soundex, which is often gives you too loose a match to be really useful, you should also look at Soundex2 (a variant of Soundex that is more granular), and for a different kind of matching, Simil(). I use all three.
除了 Soundex(它通常给您的匹配过于松散而无法真正有用)之外,您还应该查看 Soundex2(Soundex 的一种更精细的变体),以及另一种类型的匹配,Simil()。我三个都用。
回答by Tony Toews
Also consider using the first two or three letters of the first name and last name. In a database I had of 10,000 names Jo Sm (Joe/John/Joan Smith) returned only three or four records.
还可以考虑使用名字和姓氏的前两个或三个字母。在我拥有 10,000 个名字的数据库中,Jo Sm (Joe/John/Joan Smith) 只返回了三到四条记录。
Also what type of first names. Are you going to get folks using the shortened version? For example my legal first name is Anthony but I'm always called Tony.
还有什么类型的名字。你会让人们使用缩短的版本吗?例如,我的法定名字是安东尼,但我总是叫托尼。
回答by itsmatt
You are looking for SOUNDEX.
您正在寻找 SOUNDEX。