在python中逐行比较两个不同的文件

Question

提问by Sanchit

I have two different files and I want to compare theirs contents line by line, and write their common contents in a different file. Note that both of them contain some blank spaces. Here is my pseudo code:

我有两个不同的文件，我想逐行比较它们的内容，并将它们的共同内容写入不同的文件中。请注意，它们都包含一些空格。这是我的伪代码：

file1 = open('some_file_1.txt', 'r')
file2 = open('some_file_2.txt', 'r')
FO = open('some_output_file.txt', 'w')

for line1 in file1:
    for line2 in file2:
        if line1 == line2:
            FO.write("%s\n" %(line1))

FO.close()
file1.close()
file2.close()

However, by doing this, I got lots of blank spaces in my FOfile. Seems like common blank spaces are also written. I want to write only the text part. Can somebody please help me.

但是，通过这样做，我的FO文件中有很多空格。似乎也写了常见的空格。我只想写文本部分。有人能帮帮我吗。

For example: my first file (file1) contains data:

例如：我的第一个文件 (file1) 包含数据：

Config:
Hostname = TUVALU

BT:
TS_Ball_Update_Threshold = 0.2

BT:
TS_Player_Search_Radius = 4

BT:
Ball_Template_Update = 0

while second file (file2) contains data:

而第二个文件 (file2) 包含数据：

Pole_ID      = 2
Width        = 1280
Height       = 1024
Color_Mode   = 0
Sensor_Scale = 1

Tracking_ROI_Size = 4
Ball_Template_Update = 0

If you notice, last two lines of each files are the same, hence, I want to write this file in my FOfile. But, the problem with my approach is that, it writes the common blank space also. Should I use regex for this problem? I do not have experience with regex.

如果您注意到，每个文件的最后两行是相同的，因此，我想将此文件写入我的FO文件中。但是，我的方法的问题在于，它也写入了公共空白区域。我应该使用正则表达式来解决这个问题吗？我没有使用正则表达式的经验。

Answer 1

采纳答案by Rob?

This solution reads both files in one pass, excludes blank lines, and prints common lines regardless of their position in the file:

此解决方案一次性读取两个文件，排除空行，并打印公共行，而不管它们在文件中的位置：

with open('some_file_1.txt', 'r') as file1:
    with open('some_file_2.txt', 'r') as file2:
        same = set(file1).intersection(file2)

same.discard('\n')

with open('some_output_file.txt', 'w') as file_out:
    for line in same:
        file_out.write(line)

Answer 2

回答by falsetru

Once the file object is iterated, it is exausted.

一旦文件对象被迭代，它就会被耗尽。

>>> f = open('1.txt', 'w')
>>> f.write('1\n2\n3\n')
>>> f.close()
>>> f = open('1.txt', 'r')
>>> for line in f: print line
...
1

2

3

# exausted, another iteration does not produce anything.
>>> for line in f: print line
...
>>>

Use file.seek(or close/open the file) to rewind the file:

使用file.seek（或关闭/打开文件）来倒带文件：

>>> f.seek(0)
>>> for line in f: print line
...
1

2

3

Answer 3

回答by Veedrac

If order is preserved between files you might also prefer difflib. Although Rob?'s result is the bona-fide standard for intersections you might actually be looking for a rough diff-like:

如果在文件之间保留顺序，您可能也更喜欢difflib. 尽管 Rob? 的结果是真正的交叉点标准，但您实际上可能正在寻找一个粗略的差异：

from difflib import Differ

with open('cfg1.txt') as f1, open('cfg2.txt') as f2:
    differ = Differ()

    for line in differ.compare(f1.readlines(), f2.readlines()):
        if line.startswith(" "):
            print(line[2:], end="")

That said, this has a different behaviour to what you asked for (order is important) even though in this instance the same output is produced.

也就是说，即使在这种情况下产生相同的输出，这与您要求的行为（顺序很重要）具有不同的行为。

Answer 4

回答by Wayne Werner

Yet another example...

还有一个例子……

from __future__ import print_function #Only for Python2

with open('file1.txt') as f1, open('file2.txt') as f2, open('outfile.txt', 'w') as outfile:
    for line1, line2 in zip(f1, f2):
        if line1 == line2:
            print(line1, end='', file=outfile)

And if you want to eliminate common blank lines, just change the if statement to:

如果您想消除常见的空行，只需将 if 语句更改为：

if line1.strip() and line1 == line2:

.strip()removes all leading and trailing whitespace, so if that's all that's on a line, it will become an empty string "", which is considered false.

.strip()删除所有前导和尾随空格，因此如果一行中只有这些，它将成为空字符串""，这被认为是错误的。

Answer 5

回答by Dominique

I have just been faced with the same challenge, but I thought "Why programming this in Python if you can solve it with a simple "grep"?, which led to the following Python code:

我刚刚面临同样的挑战，但我想“如果可以用简单的“grep”解决它，为什么要在 Python 中编程呢？这导致了以下 Python 代码：

import subprocess
from subprocess import PIPE

try:
  output1, errors1 = subprocess.Popen(["c:\cygwin\bin\grep", "-Fvf" ,"c:\file1.txt", "c:\file2.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
  output2, errors2 = subprocess.Popen(["c:\cygwin\bin\grep", "-Fvf" ,"c:\file2.txt", "c:\file1.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
  if (len(output1) + len(output2) + len(errors1) + len(errors2) > 0):
    print ("Compare result : There are differences:");
    if (len(output1) + len(output2) > 0):
      print ("  Output differences : ");
      print (output1);
      print (output2);
    if (len(errors1) + len(errors2) > 0):
      print (" Errors : ");
      print (errors1);
      print (errors2);
  else:
    print ("Compare result : Both files are equal");
except Exception as ex:
  print("Compare result : Exception during comparison");
  print(ex);
  raise;

The trick behind this is the following: grep -Fvf file1.txt file2.txtverifies if all entries in file2.txt are present in file1.txt. By doing this in both directions we can see if the content of both files are "equal". I put "equal" between quotes because duplicate lines are disregarded in this way of working.

这背后的技巧如下： grep -Fvf file1.txt file2.txt验证file2.txt 中的所有条目是否都存在于file1.txt 中。通过在两个方向上执行此操作，我们可以查看两个文件的内容是否“相等”。我在引号之间加上“相等”，因为在这种工作方式中会忽略重复的行。

Obviously, this is just an example: you can replace grepby any commandline file comparison tool.

显然，这只是一个例子：你可以用grep任何命令行文件比较工具替换。

Answer 6

回答by Prashanth Babu

Try this:

尝试这个：

from __future__ import with_statement

filename1 = "G:\test1.TXT"
filename2 = "G:\test2.TXT"


with open(filename1) as f1:
   with open(filename2) as f2:
      file1list = f1.read().splitlines()
      file2list = f2.read().splitlines()
      list1length = len(file1list)
      list2length = len(file2list)
      if list1length == list2length:
          for index in range(len(file1list)):
              if file1list[index] == file2list[index]:
                  print file1list[index] + "==" + file2list[index]
              else:                  
                  print file1list[index] + "!=" + file2list[index]+" Not-Equel"
      else:
          print "difference inthe size of the file and number of lines"

Answer 7

回答by itzmeesuvm

If you are specifically looking for getting the difference between two files, then this might help:

如果您专门寻找两个文件之间的差异，那么这可能会有所帮助：

with open('first_file', 'r') as file1:
    with open('second_file', 'r') as file2:
        difference = set(file1).difference(file2)

difference.discard('\n')

with open('diff.txt', 'w') as file_out:
    for line in difference:
        file_out.write(line)

在python中逐行比较两个不同的文件

提问by Sanchit

采纳答案by Rob?

回答by falsetru

回答by Veedrac

回答by Wayne Werner

回答by Dominique

回答by Prashanth Babu

回答by itzmeesuvm

相关推荐

最近更新

标签

在python中逐行比较两个不同的文件

提问by Sanchit

采纳答案by Rob?

回答by falsetru

回答by Veedrac

回答by Wayne Werner

回答by Dominique

回答by Prashanth Babu

回答by itzmeesuvm

相关推荐

使用 PyCharm（或任何其他 IDE）分析 Python 程序

Python 查找最多 3 个变量的函数不返回任何内容

如何在 Python 中将 Sql Server 结果导出到 Excel

Python：for 循环 - for i in range(0,len(list) vs for i in list

相关推荐

最近更新

标签