oracle 如何使用 jaro-winkler 在表格中找到最接近的值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3585246/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I use jaro-winkler to find the closest value in a table?
提问by abhi
I have an implementation of the jaro-winkler algorithm in my database. I did not write this function. The function compares two values and gives the probability of match.
我的数据库中有 jaro-winkler 算法的实现。我没有写这个函数。该函数比较两个值并给出匹配的概率。
So jaro(string1, string2, matchnoofchars) will return a result.
所以 jaro(string1, string2, matchnoofchars) 将返回一个结果。
Instead of comparing two strings, I want to send one string with a matchnoofchars and then get a result set with the probability higher than 95%.
我想发送一个带有 matchnoofchars 的字符串,而不是比较两个字符串,然后得到一个概率高于 95% 的结果集。
For example the current function is able to return 97.62% for jaro("Philadelphia","Philadelphlaa",9)
例如,当前函数能够为 jaro("Philadelphia","Philadelphlaa",9) 返回 97.62%
I wish to tweak this function so that I am able to find "Philadelphia" for an input of "Philadelphlaa". What kind of changes do I need to make for this to happen?
我希望调整此功能,以便我能够为“Philadelphlaa”的输入找到“Philadelphia”。我需要做什么样的改变才能做到这一点?
I am using Oracle 9i.
我正在使用 Oracle 9i。
采纳答案by abhi
DECLARE
CURSOR citynames IS
SELECT city FROM table_loc_master where statecode = 'PQ';
CURSOR leasecity IS
SELECT city FROM table_loc where State = 'PQ'
MINUS
SELECT to_char(city) city FROM table_loc_master where statecode = 'PQ';
xProb NUMBER(10,8);
BEGIN
FOR x_rec IN leasecity
LOOP
FOR y_rec IN citynames
LOOP
xProb := jwrun(x_rec.city,y_rec.city,length(y_rec.city));
If xProb > 0.97 Then
DBMS_OUTPUT.PUT_LINE('Source : ' || x_rec.city || ' Target: ' || y_rec.city );
End if;
END LOOP;
END LOOP;
END;
回答by TTT
Do you have a list of words that contain words like "Philadelphia"?
您是否有包含“Philadelphia”之类的单词的单词列表?
And who did write that function?
那个函数是谁写的?
Oracle has package utl_match for fuzzy text comparison: http://download.oracle.com/docs/cd/E14072_01/appdev.112/e10577/u_match.htm
Oracle 有用于模糊文本比较的包 utl_match:http: //download.oracle.com/docs/cd/E14072_01/appdev.112/e10577/u_match.htm
Can't you do
你不能吗
select w1.word from words w1 where jaro(w1.word,'Philadelphlaa', 9) >= 0.95
从单词 w1 中选择 w1.word 其中 jaro(w1.word,'Philadelphlaa', 9) >= 0.95
?
?
This will select 'Philadelphia' if that word is present in table words.
如果该词出现在表格词中,这将选择“Philadelphia”。
回答by TTT
A little dirty but faster (untested!).
有点脏但速度更快(未经测试!)。
Let's assume first three characters are the same and length is also approximately the same.
假设前三个字符相同,长度也大致相同。
DECLARE
CURSOR citynames(cp_start in varchar2, cp_length in number) IS
SELECT city FROM table_loc_master where statecode = 'PQ'
and city like cp_start||'%'
and length(city) between cp_length -2 and cp_length +2;
CURSOR leasecity IS
SELECT city FROM table_loc where State = 'PQ'
MINUS
SELECT to_char(city) city FROM table_loc_master where statecode = 'PQ';
xProb NUMBER(10,8);
BEGIN
FOR x_rec IN leasecity
LOOP
FOR y_rec IN citynames(substr(x_rec.city,1,3), length(x_rec.city))
LOOP
xProb := jwrun(x_rec.city,y_rec.city,length(y_rec.city));
If xProb > 0.97 Then
DBMS_OUTPUT.PUT_LINE('Source : ' || x_rec.city || ' Target: ' || y_rec.city );
End if;
END LOOP;
END LOOP;
END;