Python 如何使用 SequenceMatcher 查找两个字符串之间的相似性?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4802137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 17:35:41  来源:igfitidea点击:

How to use SequenceMatcher to find similarity between two strings?

pythondifflib

提问by joolie

import difflib

a='abcd'
b='ab123'
seq=difflib.SequenceMatcher(a=a.lower(),b=b.lower())
seq=difflib.SequenceMatcher(a,b)
d=seq.ratio()*100
print d

I used the above code but obtained output is 0.0. How can I get a valid answer?

我使用了上面的代码,但获得的输出是 0.0。我怎样才能得到有效的答案?

采纳答案by Lennart Regebro

You forgot the first parameter to SequenceMatcher.

您忘记了 SequenceMatcher 的第一个参数。

>>> import difflib
>>> 
>>> a='abcd'
>>> b='ab123'
>>> seq=difflib.SequenceMatcher(None, a,b)
>>> d=seq.ratio()*100
>>> print d
44.4444444444

http://docs.python.org/library/difflib.html

http://docs.python.org/library/difflib.html

回答by Tim

From the docs:

从文档:

The SequenceMatcherclass has this constructor:

class difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True)

SequenceMatcher类有此构造函数:

class difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True)

The problem in your code is that by doing

您的代码中的问题是通过执行

seq=difflib.SequenceMatcher(a,b)

you are passing aas value for isjunkand bas value for a, leaving the default ''value for b. This results in a ratio of 0.0.

您将a作为 的值isjunkb作为 的值传递,为a保留默认''b。这导致比率为0.0

One way to overcome this (already mentioned by Lennart) is to explicitly pass Noneas extra first parameter so all the keyword arguments get assigned the correct values.

克服这个问题的一种方法(Lennart 已经提到过)是显式地None作为额外的第一个参数传递,以便所有关键字参数都被分配正确的值。

However I just found, and wanted to mention another solution, that doesn't touch the isjunkargument but uses the set_seqs()method to specify the different sequences.

但是我刚刚发现并想提及另一个解决方案,它不涉及isjunk参数,而是使用该set_seqs()方法来指定不同的序列。

>>> import difflib
>>> a = 'abcd'
>>> b = 'ab123'
>>> seq = difflib.SequenceMatcher()
>>> seq.set_seqs(a.lower(), b.lower())
>>> d = seq.ratio()*100
>>> print d
44.44444444444444