比较两个文件报告python中的差异

Question

提问by Matt

I have 2 files called "hosts" (in different directories)

我有 2 个名为“hosts”的文件（在不同的目录中）

I want to compare them using python to see if they are IDENTICAL. If they are not Identical, I want to print the difference on the screen.

我想使用 python 比较它们以查看它们是否相同。如果它们不相同，我想在屏幕上打印差异。

So far I have tried this

到目前为止，我已经尝试过这个

hosts0 = open(dst1 + "/hosts","r") 
hosts1 = open(dst2 + "/hosts","r")

lines1 = hosts0.readlines()

for i,lines2 in enumerate(hosts1):
    if lines2 != lines1[i]:
        print "line ", i, " in hosts1 is different \n"
        print lines2
    else:
        print "same"

But when I run this, I get

但是当我运行这个时，我得到

File "./audit.py", line 34, in <module>
  if lines2 != lines1[i]:
IndexError: list index out of range

Which means one of the hosts has more lines than the other. Is there a better method to compare 2 files and report the difference?

这意味着其中一台主机的线路比另一台多。有没有更好的方法来比较 2 个文件并报告差异？

Answer 1

采纳答案by rbutcher

import difflib

lines1 = '''
dog
cat
bird
buffalo
gophers
hound
horse
'''.strip().splitlines()

lines2 = '''
cat
dog
bird
buffalo
gopher
horse
mouse
'''.strip().splitlines()

# Changes:
# swapped positions of cat and dog
# changed gophers to gopher
# removed hound
# added mouse

for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm=''):
    print line

Outputs the following:

输出以下内容：

--- file1
+++ file2
@@ -1,7 +1,7 @@
+cat
 dog
-cat
 bird
 buffalo
-gophers
-hound
+gopher
 horse
+mouse

This diff gives you context -- surrounding lines to help make it clear how the file is different. You can see "cat" here twice, because it was removed from below "dog" and added above it.

这个差异为您提供了上下文——周围的行有助于明确文件的不同之处。你可以在这里看到两次“猫”，因为它是从“狗”下面删除并添加到它上面的。

You can use n=0 to remove the context.

您可以使用 n=0 删除上下文。

for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
    print line

Outputting this:

输出这个：

--- file1
+++ file2
@@ -0,0 +1 @@
+cat
@@ -2 +2,0 @@
-cat
@@ -5,2 +5 @@
-gophers
-hound
+gopher
@@ -7,0 +7 @@
+mouse

But now it's full of the "@@" lines telling you the position in the file that has changed. Let's remove the extra lines to make it more readable.

但是现在它充满了“@@”行，告诉您文件中已更改的位置。让我们删除多余的行以使其更具可读性。

for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):
    for prefix in ('---', '+++', '@@'):
        if line.startswith(prefix):
            break
    else:
        print line

Giving us this output:

给我们这个输出：

+cat
-cat
-gophers
-hound
+gopher
+mouse

Now what do you want it to do? If you ignore all removed lines, then you won't see that "hound" was removed. If you're happy just showing the additions to the file, then you could do this:

现在你想让它做什么？如果您忽略所有已删除的行，则不会看到“猎犬”已被删除。如果您很高兴只显示文件的添加内容，那么您可以这样做：

diff = difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0)
lines = list(diff)[2:]
added = [line[1:] for line in lines if line[0] == '+']
removed = [line[1:] for line in lines if line[0] == '-']

print 'additions:'
for line in added:
    print line
print
print 'additions, ignoring position'
for line in added:
    if line not in removed:
        print line

Outputting:

输出：

additions:
cat
gopher
mouse

additions, ignoring position:
gopher
mouse

You can probably tell by now that there are various ways to "print the differences" of two files, so you will need to be very specific if you want more help.

您现在可能已经知道有多种方法可以“打印出两个文件的差异”，因此如果您需要更多帮助，则需要非常具体。

Answer 2

回答by Raj

hosts0 = open("C:path\a.txt","r")
hosts1 = open("C:path\b.txt","r")

lines1 = hosts0.readlines()

for i,lines2 in enumerate(hosts1):
    if lines2 != lines1[i]:
        print "line ", i, " in hosts1 is different \n"
        print lines2
    else:
        print "same"

The above code is working for me. Can you please indicate what error you are facing?

上面的代码对我有用。你能指出你面临的错误吗？

Answer 3

回答by rbutcher

The difflib library is useful for this, and comes in the standard library. I like the unified diff format.

difflib 库对此很有用，它包含在标准库中。我喜欢统一的差异格式。

http://docs.python.org/2/library/difflib.html#difflib.unified_diff

import difflib
import sys

with open('/tmp/hosts0', 'r') as hosts0:
    with open('/tmp/hosts1', 'r') as hosts1:
        diff = difflib.unified_diff(
            hosts0.readlines(),
            hosts1.readlines(),
            fromfile='hosts0',
            tofile='hosts1',
        )
        for line in diff:
            sys.stdout.write(line)

Outputs:

输出：

--- hosts0
+++ hosts1
@@ -1,5 +1,4 @@
 one
 two
-dogs
 three

And here is a dodgy version that ignores certain lines. There might be edge cases that don't work, and there are surely better ways to do this, but maybe it will be good enough for your purposes.

这是一个忽略某些行的狡猾版本。可能存在不起作用的边缘情况，并且肯定有更好的方法来做到这一点，但也许它足以满足您的目的。

import difflib
import sys

with open('/tmp/hosts0', 'r') as hosts0:
    with open('/tmp/hosts1', 'r') as hosts1:
        diff = difflib.unified_diff(
            hosts0.readlines(),
            hosts1.readlines(),
            fromfile='hosts0',
            tofile='hosts1',
            n=0,
        )
        for line in diff:
            for prefix in ('---', '+++', '@@'):
                if line.startswith(prefix):
                    break
            else:
                sys.stdout.write(line[1:])

Answer 4

回答by rbutcher

You can add an conditional statement. If your array goes beyond index, then break and print the rest of the file.

您可以添加条件语句。如果您的数组超出索引，则中断并打印文件的其余部分。

Answer 5

回答by Azad Mehla

import difflib
f=open('a.txt','r')  #open a file
f1=open('b.txt','r') #open another file to compare
str1=f.read()
str2=f1.read()
str1=str1.split()  #split the words in file by default through the spce
str2=str2.split()
d=difflib.Differ()     # compare and just print
diff=list(d.compare(str2,str1))
print '\n'.join(diff)

比较两个文件报告python中的差异

提问by Matt

采纳答案by rbutcher

回答by Raj

回答by rbutcher

回答by rbutcher

回答by Azad Mehla

相关推荐

最近更新

标签

比较两个文件报告python中的差异

提问by Matt

采纳答案by rbutcher

回答by Raj

回答by rbutcher

回答by rbutcher

回答by Azad Mehla

相关推荐

Python 如何打印单个反斜杠？

如何使用 pandas 和 python 从列中减去单个值

Python OpenCV2（cv2）包装器来获取图像大小？

Python 类型错误：只有一个元素的整数数组可以转换为索引 3

相关推荐

最近更新

标签