Python 如何使用 SequenceMatcher 查找两个字符串之间的相似性?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4802137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to use SequenceMatcher to find similarity between two strings?
提问by joolie
import difflib
a='abcd'
b='ab123'
seq=difflib.SequenceMatcher(a=a.lower(),b=b.lower())
seq=difflib.SequenceMatcher(a,b)
d=seq.ratio()*100
print d
I used the above code but obtained output is 0.0. How can I get a valid answer?
我使用了上面的代码,但获得的输出是 0.0。我怎样才能得到有效的答案?
采纳答案by Lennart Regebro
You forgot the first parameter to SequenceMatcher.
您忘记了 SequenceMatcher 的第一个参数。
>>> import difflib
>>>
>>> a='abcd'
>>> b='ab123'
>>> seq=difflib.SequenceMatcher(None, a,b)
>>> d=seq.ratio()*100
>>> print d
44.4444444444
回答by Tim
From the docs:
从文档:
The SequenceMatcherclass has this constructor:
class difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
该SequenceMatcher类有此构造函数:
class difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
The problem in your code is that by doing
您的代码中的问题是通过执行
seq=difflib.SequenceMatcher(a,b)
you are passing aas value for isjunkand bas value for a, leaving the default ''value for b. This results in a ratio of 0.0.
您将a作为 的值isjunk和b作为 的值传递,为a保留默认''值b。这导致比率为0.0。
One way to overcome this (already mentioned by Lennart) is to explicitly pass Noneas extra first parameter so all the keyword arguments get assigned the correct values.
克服这个问题的一种方法(Lennart 已经提到过)是显式地None作为额外的第一个参数传递,以便所有关键字参数都被分配正确的值。
However I just found, and wanted to mention another solution, that doesn't touch the isjunkargument but uses the set_seqs()method to specify the different sequences.
但是我刚刚发现并想提及另一个解决方案,它不涉及isjunk参数,而是使用该set_seqs()方法来指定不同的序列。
>>> import difflib
>>> a = 'abcd'
>>> b = 'ab123'
>>> seq = difflib.SequenceMatcher()
>>> seq.set_seqs(a.lower(), b.lower())
>>> d = seq.ratio()*100
>>> print d
44.44444444444444

