Python - 两个字符串之间的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17904097/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python - difference between two strings
提问by user2626682
I'd like to store a lot of words in a list. Many of these words are very similar. For example I have word afrykanerskoj?zyczny
and many of words like afrykanerskoj?zycznym
, afrykanerskoj?zyczni
, nieafrykanerskoj?zyczni
. What is the effective (fast and giving small diff size) solution to find difference between two strings and restore second string from the first one and diff?
我想在列表中存储很多单词。其中许多词非常相似。例如,我有单词afrykanerskoj?zyczny
和许多单词,例如afrykanerskoj?zycznym
, afrykanerskoj?zyczni
, nieafrykanerskoj?zyczni
。找到两个字符串之间的差异并从第一个字符串和 diff 恢复第二个字符串的有效(快速且提供较小的差异大小)解决方案是什么?
回答by perreal
You can look into the regex module(the fuzzy section). I don't know if you can get the actual differences, but at least you can specify allowed number of different types of changes like insert, delete, and substitutions:
您可以查看正则表达式模块(模糊部分)。我不知道您是否可以获得实际差异,但至少您可以指定允许的不同类型更改的数量,例如插入、删除和替换:
import regex
sequence = 'afrykanerskojezyczny'
queries = [ 'afrykanerskojezycznym', 'afrykanerskojezyczni',
'nieafrykanerskojezyczni' ]
for q in queries:
m = regex.search(r'(%s){e<=2}'%q, sequence)
print 'match' if m else 'nomatch'
回答by Elias Benevedes
The answer to my comment above on the Original Question makes me think this is all he wants:
我上面对原始问题的评论的答案让我认为这就是他想要的:
loopnum = 0
word = 'afrykanerskoj?zyczny'
wordlist = ['afrykanerskoj?zycznym','afrykanerskoj?zyczni','nieafrykanerskoj?zyczni']
for i in wordlist:
wordlist[loopnum] = word
loopnum += 1
This will do the following:
这将执行以下操作:
For every value in wordlist, set that value of the wordlist to the origional code.
对于 wordlist 中的每个值,将 wordlist 的值设置为原始代码。
All you have to do is put this piece of code where you need to change wordlist, making sure you store the words you need to change in wordlist, and that the original word is correct.
您所要做的就是将这段代码放在您需要更改的单词列表中,确保将需要更改的单词存储在单词列表中,并且原始单词是正确的。
Hope this helps!
希望这可以帮助!
回答by dawg
You can use ndiffin the difflib module to do this. It has all the information necessary to convert one string into another string.
您可以在 difflib 模块中使用ndiff来执行此操作。它具有将一个字符串转换为另一个字符串所需的所有信息。
A simple example:
一个简单的例子:
import difflib
cases=[('afrykanerskoj?zyczny', 'afrykanerskoj?zycznym'),
('afrykanerskoj?zyczni', 'nieafrykanerskoj?zyczni'),
('afrykanerskoj?zycznym', 'afrykanerskoj?zyczny'),
('nieafrykanerskoj?zyczni', 'afrykanerskoj?zyczni'),
('nieafrynerskoj?zyczni', 'afrykanerskojzyczni'),
('abcdefg','xac')]
for a,b in cases:
print('{} => {}'.format(a,b))
for i,s in enumerate(difflib.ndiff(a, b)):
if s[0]==' ': continue
elif s[0]=='-':
print(u'Delete "{}" from position {}'.format(s[-1],i))
elif s[0]=='+':
print(u'Add "{}" to position {}'.format(s[-1],i))
print()
prints:
印刷:
afrykanerskoj?zyczny => afrykanerskoj?zycznym
Add "m" to position 20
afrykanerskoj?zyczni => nieafrykanerskoj?zyczni
Add "n" to position 0
Add "i" to position 1
Add "e" to position 2
afrykanerskoj?zycznym => afrykanerskoj?zyczny
Delete "m" from position 20
nieafrykanerskoj?zyczni => afrykanerskoj?zyczni
Delete "n" from position 0
Delete "i" from position 1
Delete "e" from position 2
nieafrynerskoj?zyczni => afrykanerskojzyczni
Delete "n" from position 0
Delete "i" from position 1
Delete "e" from position 2
Add "k" to position 7
Add "a" to position 8
Delete "?" from position 16
abcdefg => xac
Add "x" to position 0
Delete "b" from position 2
Delete "d" from position 4
Delete "e" from position 5
Delete "f" from position 6
Delete "g" from position 7
回答by Craig Silverstein
What you are asking for is a specialized form of compression. xdelta3was designed for this particular kind of compression, and there's a python binding for it, but you could probably get away with using zlib directly. You'd want to use zlib.compressobj
and zlib.decompressobj
with the zdict
parameter set to your "base word", e.g. afrykanerskoj?zyczny
.
您要求的是一种特殊的压缩形式。 xdelta3是为这种特殊类型的压缩而设计的,它有一个 python 绑定,但你可能可以直接使用 zlib。您希望使用zlib.compressobj
和zlib.decompressobj
将zdict
参数设置为您的“基本词”,例如afrykanerskoj?zyczny
.
Caveats are zdict
is only supported in python 3.3 and higher, and it's easiest to code if you have the same "base word" for all your diffs, which may or may not be what you want.
注意事项zdict
仅在 python 3.3 及更高版本中受支持,如果您的所有差异都具有相同的“基本词”,则最容易编码,这可能是您想要的,也可能不是。
回答by Eric
I like the ndiff answer, but if you want to spit it all into a list of only the changes, you could do something like:
我喜欢 ndiff 答案,但是如果您想将其全部吐出仅包含更改的列表,则可以执行以下操作:
import difflib
case_a = 'afrykbnerskoj?zyczny'
case_b = 'afrykanerskoj?zycznym'
output_list = [li for li in difflib.ndiff(case_a, case_b) if li[0] != ' ']