Python difflib：突出显示内联差异？

Question

提问by AnC

When comparing similar lines, I want to highlight the differences on the same line:

在比较相似的线条时，我想突出显示同一线条上的差异：

a) lorem ipsum dolor sit amet
b) lorem foo ipsum dolor amet

lorem <ins>foo</ins> ipsum dolor <del>sit</del> amet

While difflib.HtmlDiff appears to do this sort of inline highlighting, it produces very verbose markup.

虽然 difflib.HtmlDiff 似乎执行这种内联突出显示，但它会产生非常冗长的标记。

Unfortunately, I have not been able to find another class/method which does not operate on a line-by-line basis.

不幸的是，我无法找到另一个不能逐行运行的类/方法。

Am I missing anything? Any pointers would be appreciated!

我错过了什么吗？任何指针将不胜感激！

Answer 1

回答by tzot

For your simple example:

对于您的简单示例：

import difflib
def show_diff(seqm):
    """Unify operations between two compared strings
seqm is a difflib.SequenceMatcher instance whose a & b are strings"""
    output= []
    for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
        if opcode == 'equal':
            output.append(seqm.a[a0:a1])
        elif opcode == 'insert':
            output.append("<ins>" + seqm.b[b0:b1] + "</ins>")
        elif opcode == 'delete':
            output.append("<del>" + seqm.a[a0:a1] + "</del>")
        elif opcode == 'replace':
            raise NotImplementedError, "what to do with 'replace' opcode?"
        else:
            raise RuntimeError, "unexpected opcode"
    return ''.join(output)

>>> sm= difflib.SequenceMatcher(None, "lorem ipsum dolor sit amet", "lorem foo ipsum dolor amet")
>>> show_diff(sm)
'lorem<ins> foo</ins> ipsum dolor <del>sit </del>amet'

This works with strings. You should decide what to do with "replace" opcodes.

这适用于字符串。您应该决定如何处理“替换”操作码。

Answer 2

回答by orip

Here's an inline differ inspired by @tzot's answer above(also Python 3 compatible):

这是一个内联差异，灵感来自上面@tzot 的回答（也兼容 Python 3）：

def inline_diff(a, b):
    import difflib
    matcher = difflib.SequenceMatcher(None, a, b)
    def process_tag(tag, i1, i2, j1, j2):
        if tag == 'replace':
            return '{' + matcher.a[i1:i2] + ' -> ' + matcher.b[j1:j2] + '}'
        if tag == 'delete':
            return '{- ' + matcher.a[i1:i2] + '}'
        if tag == 'equal':
            return matcher.a[i1:i2]
        if tag == 'insert':
            return '{+ ' + matcher.b[j1:j2] + '}'
        assert False, "Unknown tag %r"%tag
    return ''.join(process_tag(*t) for t in matcher.get_opcodes())

It's not perfect, for example, it would be nice to expand 'replace' opcodes to recognize the full word replaced instead of just the few different letters, but it's a good place to start.

它并不完美，例如，扩展“替换”操作码以识别替换的完整单词而不是几个不同的字母会很好，但这是一个很好的起点。

Sample output:

示例输出：

>>> a='Lorem ipsum dolor sit amet consectetur adipiscing'
>>> b='Lorem bananas ipsum cabbage sit amet adipiscing'
>>> print(inline_diff(a, b))
Lorem{+  bananas} ipsum {dolor -> cabbage} sit amet{-  consectetur} adipiscing

Answer 3

回答by Adam

difflib.SequenceMatcherwill operate on single lines. You can use the "opcodes" to determine how to change the first line to make it the second line.

difflib.SequenceMatcher将在单行上运行。您可以使用“操作码”来确定如何更改第一行以使其成为第二行。

Python difflib：突出显示内联差异？

提问by AnC

回答by tzot

回答by orip

回答by Adam

相关推荐

最近更新

标签

Python difflib：突出显示内联差异？

提问by AnC

回答by tzot

回答by orip

回答by Adam

相关推荐

python 如何发送带有损坏 FCS 的以太网帧？

python 将 XML 导入 SQL 数据库

Python distutils，如何获取将要使用的编译器？

python 在 Django 表单字段之间显示一些自由文本

相关推荐

最近更新

标签