返回两个文件之间不同的行(Python)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17799680/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Returning lines that differ between two files (Python)
提问by user2597879
I have two files with tens of thousands of lines each, output1.txt and output2.txt. I want to iterate through both files and return the line (and content) of the lines that differ between the two. They're mostly the same which is why I can't find the differences (filecmp.cmp returns false).
我有两个文件,每个文件都有数万行,output1.txt 和 output2.txt。我想遍历两个文件并返回两者之间不同的行的行(和内容)。它们大多相同,这就是为什么我找不到差异的原因(filecmp.cmp 返回 false)。
采纳答案by dawg
You can do something like this:
你可以这样做:
import difflib, sys
tl=100000 # large number of lines
# create two test files (Unix directories...)
with open('/tmp/f1.txt','w') as f:
for x in range(tl):
f.write('line {}\n'.format(x))
with open('/tmp/f2.txt','w') as f:
for x in range(tl+10): # add 10 lines
if x in (500,505,1000,tl-2):
continue # skip these lines
f.write('line {}\n'.format(x))
with open('/tmp/f1.txt','r') as f1, open('/tmp/f2.txt','r') as f2:
diff = difflib.ndiff(f1.readlines(),f2.readlines())
for line in diff:
if line.startswith('-'):
sys.stdout.write(line)
elif line.startswith('+'):
sys.stdout.write('\t\t'+line)
Prints (in 400 ms):
打印(400 毫秒):
- line 500
- line 505
- line 1000
- line 99998
+ line 100000
+ line 100001
+ line 100002
+ line 100003
+ line 100004
+ line 100005
+ line 100006
+ line 100007
+ line 100008
+ line 100009
If you want the line number, use enumerate:
如果您想要行号,请使用枚举:
with open('/tmp/f1.txt','r') as f1, open('/tmp/f2.txt','r') as f2:
diff = difflib.ndiff(f1.readlines(),f2.readlines())
for i,line in enumerate(diff):
if line.startswith(' '):
continue
sys.stdout.write('My count: {}, text: {}'.format(i,line))
回答by John La Rooy
7.4. difflib— Helpers for computing deltas
7.4. difflib— 计算增量的助手
New in version 2.1.
2.1 版中的新功能。
This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs. For comparing directories and files, see also, the filecmp module.
该模块提供用于比较序列的类和函数。例如,它可以用于比较文件,并且可以生成各种格式的差异信息,包括 HTML 和上下文以及统一差异。要比较目录和文件,另请参见 filecmp 模块。
回答by korylprince
As long as you don't care about order you could use:
只要您不关心订单,您就可以使用:
with open('file1') as f:
t1 = f.read().splitlines()
t1s = set(t1)
with open('file2') as f:
t2 = f.read().splitlines()
t2s = set(t2)
#in file1 but not file2
print "Only in file1"
for diff in t1s-t2s:
print t1.index(diff), diff
#in file2 but not file1
print "Only in file2"
for diff in t2s-t1s:
print t2.index(diff), diff
Edit:
If you do care about order and they're mostly the same then why not just use the command diff
?
编辑:如果您确实关心顺序并且它们几乎相同,那么为什么不直接使用该命令diff
呢?