在python中逐行比较两个不同的文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19007383/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Compare two different files line by line in python
提问by Sanchit
I have two different files and I want to compare theirs contents line by line, and write their common contents in a different file. Note that both of them contain some blank spaces. Here is my pseudo code:
我有两个不同的文件,我想逐行比较它们的内容,并将它们的共同内容写入不同的文件中。请注意,它们都包含一些空格。这是我的伪代码:
file1 = open('some_file_1.txt', 'r')
file2 = open('some_file_2.txt', 'r')
FO = open('some_output_file.txt', 'w')
for line1 in file1:
for line2 in file2:
if line1 == line2:
FO.write("%s\n" %(line1))
FO.close()
file1.close()
file2.close()
However, by doing this, I got lots of blank spaces in my FOfile. Seems like common blank spaces are also written. I want to write only the text part. Can somebody please help me.
但是,通过这样做,我的FO文件中有很多空格。似乎也写了常见的空格。我只想写文本部分。有人能帮帮我吗。
For example: my first file (file1) contains data:
例如:我的第一个文件 (file1) 包含数据:
Config:
Hostname = TUVALU
BT:
TS_Ball_Update_Threshold = 0.2
BT:
TS_Player_Search_Radius = 4
BT:
Ball_Template_Update = 0
while second file (file2) contains data:
而第二个文件 (file2) 包含数据:
Pole_ID = 2
Width = 1280
Height = 1024
Color_Mode = 0
Sensor_Scale = 1
Tracking_ROI_Size = 4
Ball_Template_Update = 0
If you notice, last two lines of each files are the same, hence, I want to write this file in my FOfile. But, the problem with my approach is that, it writes the common blank space also. Should I use regex for this problem? I do not have experience with regex.
如果您注意到,每个文件的最后两行是相同的,因此,我想将此文件写入我的FO文件中。但是,我的方法的问题在于,它也写入了公共空白区域。我应该使用正则表达式来解决这个问题吗?我没有使用正则表达式的经验。
采纳答案by Rob?
This solution reads both files in one pass, excludes blank lines, and prints common lines regardless of their position in the file:
此解决方案一次性读取两个文件,排除空行,并打印公共行,而不管它们在文件中的位置:
with open('some_file_1.txt', 'r') as file1:
with open('some_file_2.txt', 'r') as file2:
same = set(file1).intersection(file2)
same.discard('\n')
with open('some_output_file.txt', 'w') as file_out:
for line in same:
file_out.write(line)
回答by falsetru
Once the file object is iterated, it is exausted.
一旦文件对象被迭代,它就会被耗尽。
>>> f = open('1.txt', 'w')
>>> f.write('1\n2\n3\n')
>>> f.close()
>>> f = open('1.txt', 'r')
>>> for line in f: print line
...
1
2
3
# exausted, another iteration does not produce anything.
>>> for line in f: print line
...
>>>
Use file.seek
(or close/open the file) to rewind the file:
使用file.seek
(或关闭/打开文件)来倒带文件:
>>> f.seek(0)
>>> for line in f: print line
...
1
2
3
回答by Veedrac
If order is preserved between files you might also prefer difflib
. Although Rob?'s result is the bona-fide standard for intersections you might actually be looking for a rough diff-like:
如果在文件之间保留顺序,您可能也更喜欢difflib
. 尽管 Rob? 的结果是真正的交叉点标准,但您实际上可能正在寻找一个粗略的差异:
from difflib import Differ
with open('cfg1.txt') as f1, open('cfg2.txt') as f2:
differ = Differ()
for line in differ.compare(f1.readlines(), f2.readlines()):
if line.startswith(" "):
print(line[2:], end="")
That said, this has a different behaviour to what you asked for (order is important) even though in this instance the same output is produced.
也就是说,即使在这种情况下产生相同的输出,这与您要求的行为(顺序很重要)具有不同的行为。
回答by Wayne Werner
Yet another example...
还有一个例子……
from __future__ import print_function #Only for Python2
with open('file1.txt') as f1, open('file2.txt') as f2, open('outfile.txt', 'w') as outfile:
for line1, line2 in zip(f1, f2):
if line1 == line2:
print(line1, end='', file=outfile)
And if you want to eliminate common blank lines, just change the if statement to:
如果您想消除常见的空行,只需将 if 语句更改为:
if line1.strip() and line1 == line2:
if line1.strip() and line1 == line2:
.strip()
removes all leading and trailing whitespace, so if that's all that's on a line, it will become an empty string ""
, which is considered false.
.strip()
删除所有前导和尾随空格,因此如果一行中只有这些,它将成为空字符串""
,这被认为是错误的。
回答by Dominique
I have just been faced with the same challenge, but I thought "Why programming this in Python if you can solve it with a simple "grep"?, which led to the following Python code:
我刚刚面临同样的挑战,但我想“如果可以用简单的“grep”解决它,为什么要在 Python 中编程呢?这导致了以下 Python 代码:
import subprocess
from subprocess import PIPE
try:
output1, errors1 = subprocess.Popen(["c:\cygwin\bin\grep", "-Fvf" ,"c:\file1.txt", "c:\file2.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
output2, errors2 = subprocess.Popen(["c:\cygwin\bin\grep", "-Fvf" ,"c:\file2.txt", "c:\file1.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
if (len(output1) + len(output2) + len(errors1) + len(errors2) > 0):
print ("Compare result : There are differences:");
if (len(output1) + len(output2) > 0):
print (" Output differences : ");
print (output1);
print (output2);
if (len(errors1) + len(errors2) > 0):
print (" Errors : ");
print (errors1);
print (errors2);
else:
print ("Compare result : Both files are equal");
except Exception as ex:
print("Compare result : Exception during comparison");
print(ex);
raise;
The trick behind this is the following:
grep -Fvf file1.txt file2.txt
verifies if all entries in file2.txt are present in file1.txt. By doing this in both directions we can see if the content of both files are "equal". I put "equal" between quotes because duplicate lines are disregarded in this way of working.
这背后的技巧如下:
grep -Fvf file1.txt file2.txt
验证file2.txt 中的所有条目是否都存在于file1.txt 中。通过在两个方向上执行此操作,我们可以查看两个文件的内容是否“相等”。我在引号之间加上“相等”,因为在这种工作方式中会忽略重复的行。
Obviously, this is just an example: you can replace grep
by any commandline file comparison tool.
显然,这只是一个例子:你可以用grep
任何命令行文件比较工具替换。
回答by Prashanth Babu
Try this:
尝试这个:
from __future__ import with_statement
filename1 = "G:\test1.TXT"
filename2 = "G:\test2.TXT"
with open(filename1) as f1:
with open(filename2) as f2:
file1list = f1.read().splitlines()
file2list = f2.read().splitlines()
list1length = len(file1list)
list2length = len(file2list)
if list1length == list2length:
for index in range(len(file1list)):
if file1list[index] == file2list[index]:
print file1list[index] + "==" + file2list[index]
else:
print file1list[index] + "!=" + file2list[index]+" Not-Equel"
else:
print "difference inthe size of the file and number of lines"
回答by itzmeesuvm
If you are specifically looking for getting the difference between two files, then this might help:
如果您专门寻找两个文件之间的差异,那么这可能会有所帮助:
with open('first_file', 'r') as file1:
with open('second_file', 'r') as file2:
difference = set(file1).difference(file2)
difference.discard('\n')
with open('diff.txt', 'w') as file_out:
for line in difference:
file_out.write(line)