在 python 中生成和应用差异

Question

提问by noio

Is there an 'out-of-the-box' way in python to generate a list of differences between two texts, and then applying this diff to one file to obtain the other, later?

python中是否有一种“开箱即用”的方式来生成两个文本之间的差异列表，然后将此差异应用于一个文件以获得另一个文件，稍后？

I want to keep the revision history of a text, but I don't want to save the entire text for each revision if there is just a single edited line. I looked at difflib, but I couldn't see how to generate a list of just the edited lines that can still be used to modify one text to obtain the other.

我想保留文本的修订历史，但如果只有一个编辑过的行，我不想为每个修订保存整个文本。我查看了difflib，但我看不到如何生成仅包含已编辑行的列表，这些行仍可用于修改一个文本以获得另一个文本。

Answer 1

采纳答案by Density 21.5

Did you have a look at diff-match-patch from google? Apparantly google Docs uses this set of algoritms. It includes not only a diff module, but also a patch module, so you can generate the newest file from older files and diffs.

你有没有看过谷歌的 diff-match-patch ？显然 google Docs 使用了这组算法。它不仅包含一个 diff 模块，还包含一个补丁模块，因此您可以从旧文件和 diff 生成最新文件。

A python version is included.

包含一个 python 版本。

http://code.google.com/p/google-diff-match-patch/

Answer 2

回答by pwdyson

Does difflib.unified_diff do want you want? There is an example here.

difflib.unified_diff 确实想要你想要的吗？有一个例子在这里。

Answer 3

回答by Isaac Turner

I've implemented a pure python function to apply diff patches to recover either of the input strings, I hope someone finds it useful. It uses parses the Unified diff format.

我已经实现了一个纯 python 函数来应用差异补丁来恢复任何一个输入字符串，我希望有人觉得它有用。它使用解析统一差异格式。

import re

_hdr_pat = re.compile("^@@ -(\d+),?(\d+)? \+(\d+),?(\d+)? @@$")

def apply_patch(s,patch,revert=False):
  """
  Apply unified diff patch to string s to recover newer string.
  If revert is True, treat s as the newer string, recover older string.
  """
  s = s.splitlines(True)
  p = patch.splitlines(True)
  t = ''
  i = sl = 0
  (midx,sign) = (1,'+') if not revert else (3,'-')
  while i < len(p) and p[i].startswith(("---","+++")): i += 1 # skip header lines
  while i < len(p):
    m = _hdr_pat.match(p[i])
    if not m: raise Exception("Cannot process diff")
    i += 1
    l = int(m.group(midx))-1 + (m.group(midx+1) == '0')
    t += ''.join(s[sl:l])
    sl = l
    while i < len(p) and p[i][0] != '@':
      if i+1 < len(p) and p[i+1][0] == '\': line = p[i][:-1]; i += 2
      else: line = p[i]; i += 1
      if len(line) > 0:
        if line[0] == sign or line[0] == ' ': t += line[1:]
        sl += (line[0] != sign)
  t += ''.join(s[sl:])
  return t

If there are header lines ("--- ...\n","+++ ...\n")it skips over them. If we have a unified diff string diffstrrepresenting the diff between oldstrand newstr:

如果有标题行，("--- ...\n","+++ ...\n")它会跳过它们。如果我们有一个统一的差异字符串diffstr代表之间的差异oldstr和newstr：

# recreate `newstr` from `oldstr`+patch
newstr = apply_patch(oldstr, diffstr)
# recreate `oldstr` from `newstr`+patch
oldstr = apply_patch(newstr, diffstr, True)

In Python you can generate a unified diff of two strings using difflib(part of the standard library):

在 Python 中，您可以使用difflib（标准库的一部分）生成两个字符串的统一差异：

import difflib
_no_eol = "\ No newline at end of file"

def make_patch(a,b):
  """
  Get unified string diff between two strings. Trims top two lines.
  Returns empty string if strings are identical.
  """
  diffs = difflib.unified_diff(a.splitlines(True),b.splitlines(True),n=0)
  try: _,_ = next(diffs),next(diffs)
  except StopIteration: pass
  return ''.join([d if d[-1] == '\n' else d+'\n'+_no_eol+'\n' for d in diffs])

On unix: diff -U0 a.txt b.txt

在Unix上： diff -U0 a.txt b.txt

Code is on GitHub here along with tests using ASCII and random unicode characters: https://gist.github.com/noporpoise/16e731849eb1231e86d78f9dfeca3abc

代码在 GitHub 上以及使用 ASCII 和随机 unicode 字符的测试：https: //gist.github.com/noporpoise/16e731849eb1231e86d78f9dfeca3abc

Answer 4

回答by jai

AFAIK most diff algorithms use a simple Longest Common Subsequencematch, to find the common part between two texts and whatever is left is considered the difference. It shouldn't be too difficult to code up your own dynamic programming algorithm to accomplish that in python, the wikipedia page above provides the algorithm too.

AFAIK 大多数差异算法使用简单的最长公共子序列匹配，以找到两个文本之间的公共部分，剩下的就被认为是差异。在 python 中编写自己的动态编程算法来实现这一点应该不会太难，上面的维基百科页面也提供了该算法。

Answer 5

回答by Karthik Hegde

Probably you can use unified_diffto generate the list of difference in a file. Only the changed texts in your file can be written it into a new text file where you can use it for your future reference. This is the code which helps you to write only the difference to your new file. I hope this is what you are asking for !

可能您可以使用统一的差异来生成文件中的差异列表。只有文件中更改的文本才能将其写入新的文本文件，供您将来参考。这是帮助您仅将差异写入新文件的代码。我希望这就是你所要求的！

diff = difflib.unified_diff(old_file, new_file, lineterm='')
    lines = list(diff)[2:]
    # linesT = list(diff)[0:3]
    print (lines[0])
    added = [lineA for lineA in lines if lineA[0] == '+']


    with open("output.txt", "w") as fh1:
     for line in added:
       fh1.write(line)
    print '+',added
    removed = [lineB for lineB in lines if lineB[0] == '-']
    with open("output.txt", "a") as fh1:
     for line in removed:
       fh1.write(line)
    print '-',removed

Use this in your code to save only the difference output !

在您的代码中使用它以仅保存差异输出！

Answer 6

回答by Simon Callan

Does it have to be a python solution?
My first thoughts as to a solution would be to use either a Version Control System (Subversion, Git, etc.) or the diff/ patchutilities that are standard with a unix system, or are part of cygwinfor a windows based system.

它必须是python解决方案吗？
我对解决方案的第一个想法是使用版本控制系统（Subversion、Git 等）或unix 系统标准的diff/patch实用程序，或者是cygwin基于 Windows 的系统的一部分。

在 python 中生成和应用差异

提问by noio

采纳答案by Density 21.5

回答by pwdyson

回答by Isaac Turner

回答by jai

回答by Karthik Hegde

回答by Simon Callan

相关推荐

最近更新

标签

在 python 中生成和应用差异

提问by noio

采纳答案by Density 21.5

回答by pwdyson

回答by Isaac Turner

回答by jai

回答by Karthik Hegde

回答by Simon Callan

相关推荐

python 在python中从NameError获取未定义的名称

python SQLAlchemy 子查询 - 总和的平均值

python SocketServer.ThreadingTCPServer - 程序重启后无法绑定到地址

python 在python中读取二进制文件

相关推荐

最近更新

标签