Python - 两个字符串之间的区别

Question

提问by user2626682

I'd like to store a lot of words in a list. Many of these words are very similar. For example I have word afrykanerskoj?zycznyand many of words like afrykanerskoj?zycznym, afrykanerskoj?zyczni, nieafrykanerskoj?zyczni. What is the effective (fast and giving small diff size) solution to find difference between two strings and restore second string from the first one and diff?

我想在列表中存储很多单词。其中许多词非常相似。例如，我有单词afrykanerskoj?zyczny和许多单词，例如afrykanerskoj?zycznym, afrykanerskoj?zyczni, nieafrykanerskoj?zyczni。找到两个字符串之间的差异并从第一个字符串和 diff 恢复第二个字符串的有效（快速且提供较小的差异大小）解决方案是什么？

Answer 1

回答by perreal

You can look into the regex module(the fuzzy section). I don't know if you can get the actual differences, but at least you can specify allowed number of different types of changes like insert, delete, and substitutions:

您可以查看正则表达式模块（模糊部分）。我不知道您是否可以获得实际差异，但至少您可以指定允许的不同类型更改的数量，例如插入、删除和替换：

import regex
sequence = 'afrykanerskojezyczny'
queries = [ 'afrykanerskojezycznym', 'afrykanerskojezyczni', 
            'nieafrykanerskojezyczni' ]
for q in queries:
    m = regex.search(r'(%s){e<=2}'%q, sequence)
    print 'match' if m else 'nomatch'

Answer 2

回答by Elias Benevedes

The answer to my comment above on the Original Question makes me think this is all he wants:

我上面对原始问题的评论的答案让我认为这就是他想要的：

loopnum = 0
word = 'afrykanerskoj?zyczny'
wordlist = ['afrykanerskoj?zycznym','afrykanerskoj?zyczni','nieafrykanerskoj?zyczni']
for i in wordlist:
    wordlist[loopnum] = word
    loopnum += 1

This will do the following:

这将执行以下操作：

For every value in wordlist, set that value of the wordlist to the origional code.

对于 wordlist 中的每个值，将 wordlist 的值设置为原始代码。

All you have to do is put this piece of code where you need to change wordlist, making sure you store the words you need to change in wordlist, and that the original word is correct.

您所要做的就是将这段代码放在您需要更改的单词列表中，确保将需要更改的单词存储在单词列表中，并且原始单词是正确的。

Hope this helps!

希望这可以帮助！

Answer 3

回答by dawg

You can use ndiffin the difflib module to do this. It has all the information necessary to convert one string into another string.

您可以在 difflib 模块中使用ndiff来执行此操作。它具有将一个字符串转换为另一个字符串所需的所有信息。

A simple example:

一个简单的例子：

import difflib

cases=[('afrykanerskoj?zyczny', 'afrykanerskoj?zycznym'),
       ('afrykanerskoj?zyczni', 'nieafrykanerskoj?zyczni'),
       ('afrykanerskoj?zycznym', 'afrykanerskoj?zyczny'),
       ('nieafrykanerskoj?zyczni', 'afrykanerskoj?zyczni'),
       ('nieafrynerskoj?zyczni', 'afrykanerskojzyczni'),
       ('abcdefg','xac')] 

for a,b in cases:     
    print('{} => {}'.format(a,b))  
    for i,s in enumerate(difflib.ndiff(a, b)):
        if s[0]==' ': continue
        elif s[0]=='-':
            print(u'Delete "{}" from position {}'.format(s[-1],i))
        elif s[0]=='+':
            print(u'Add "{}" to position {}'.format(s[-1],i))    
    print()

prints:

印刷：

afrykanerskoj?zyczny => afrykanerskoj?zycznym
Add "m" to position 20

afrykanerskoj?zyczni => nieafrykanerskoj?zyczni
Add "n" to position 0
Add "i" to position 1
Add "e" to position 2

afrykanerskoj?zycznym => afrykanerskoj?zyczny
Delete "m" from position 20

nieafrykanerskoj?zyczni => afrykanerskoj?zyczni
Delete "n" from position 0
Delete "i" from position 1
Delete "e" from position 2

nieafrynerskoj?zyczni => afrykanerskojzyczni
Delete "n" from position 0
Delete "i" from position 1
Delete "e" from position 2
Add "k" to position 7
Add "a" to position 8
Delete "?" from position 16

abcdefg => xac
Add "x" to position 0
Delete "b" from position 2
Delete "d" from position 4
Delete "e" from position 5
Delete "f" from position 6
Delete "g" from position 7

Answer 4

回答by Craig Silverstein

What you are asking for is a specialized form of compression. xdelta3was designed for this particular kind of compression, and there's a python binding for it, but you could probably get away with using zlib directly. You'd want to use zlib.compressobjand zlib.decompressobjwith the zdictparameter set to your "base word", e.g. afrykanerskoj?zyczny.

您要求的是一种特殊的压缩形式。 xdelta3是为这种特殊类型的压缩而设计的，它有一个 python 绑定，但你可能可以直接使用 zlib。您希望使用zlib.compressobj和zlib.decompressobj将zdict参数设置为您的“基本词”，例如afrykanerskoj?zyczny.

Caveats are zdictis only supported in python 3.3 and higher, and it's easiest to code if you have the same "base word" for all your diffs, which may or may not be what you want.

注意事项zdict仅在 python 3.3 及更高版本中受支持，如果您的所有差异都具有相同的“基本词”，则最容易编码，这可能是您想要的，也可能不是。

Answer 5

回答by Eric

I like the ndiff answer, but if you want to spit it all into a list of only the changes, you could do something like:

我喜欢 ndiff 答案，但是如果您想将其全部吐出仅包含更改的列表，则可以执行以下操作：

import difflib

case_a = 'afrykbnerskoj?zyczny'
case_b = 'afrykanerskoj?zycznym'

output_list = [li for li in difflib.ndiff(case_a, case_b) if li[0] != ' ']

Python - 两个字符串之间的区别

提问by user2626682

回答by perreal

回答by Elias Benevedes

回答by dawg

回答by Craig Silverstein

回答by Eric

相关推荐

最近更新

标签

Python - 两个字符串之间的区别

提问by user2626682

回答by perreal

回答by Elias Benevedes

回答by dawg

回答by Craig Silverstein

回答by Eric

相关推荐

Python 带有多个参数的 Flask url_for()

Python 无法使用opencv打开视频

Python matplotlib/pandas 中是否有参数将直方图的 Y 轴作为百分比？

Python 如何在电报机器人中获得身份验证？

相关推荐

最近更新

标签