使用 Python 比较 2 个 excel 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37113173/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:53:01  来源:igfitidea点击:

Compare 2 excel files using Python

pythonexcelpandascomparexlrd

提问by Abbas

I have two xlsxfiles as follows:

我有两个xlsx文件如下:

value1   value2   value3
0.456   3.456    0.4325436
6.24654 0.235435 6.376546
4.26545 4.264543 7.2564523

and

value1   value2  value3
0.456   3.456    0.4325436
6.24654 0.23546  6.376546
4.26545 4.264543 7.2564523

I need to compare all cells, and if a cell from file1 !=a cell from file2printthat.

我需要比较所有单元格,如果一个单元file1 !=格来自file2print那个单元格。

import xlrd
rb = xlrd.open_workbook('file1.xlsx')
rb1 = xlrd.open_workbook('file2.xlsx')
sheet = rb.sheet_by_index(0)
for rownum in range(sheet.nrows):
    row = sheet.row_values(rownum)
    for c_el in row:
        print c_el

How can I add the comparison cell of file1and file2?

如何添加的比较电池file1file2

采纳答案by Martin Evans

The following approach should get you started:

以下方法应该可以帮助您入门:

from itertools import izip_longest
import xlrd

rb1 = xlrd.open_workbook('file1.xlsx')
rb2 = xlrd.open_workbook('file2.xlsx')

sheet1 = rb1.sheet_by_index(0)
sheet2 = rb2.sheet_by_index(0)

for rownum in range(max(sheet1.nrows, sheet2.nrows)):
    if rownum < sheet1.nrows:
        row_rb1 = sheet1.row_values(rownum)
        row_rb2 = sheet2.row_values(rownum)

        for colnum, (c1, c2) in enumerate(izip_longest(row_rb1, row_rb2)):
            if c1 != c2:
                print "Row {} Col {} - {} != {}".format(rownum+1, colnum+1, c1, c2)
    else:
        print "Row {} missing".format(rownum+1)

This will display any cells which are different between the two files. For your given two files, this will display:

这将显示两个文件之间不同的任何单元格。对于给定的两个文件,这将显示:

Row 3 Col 2 - 0.235435 != 0.23546

If you prefer cell names, then use xlrd.formular.colname():

如果您更喜欢单元格名称,请使用xlrd.formular.colname()

print "Cell {}{}  {} != {}".format(rownum+1, xlrd.formula.colname(colnum), c1, c2)

Giving you:

给你:

Cell 3B  0.235435 != 0.23546

回答by Abbas

Use pandasand you can do it as simple as this:

使用pandas,您可以像这样简单地做到这一点:

import pandas as pd

df1 = pd.read_excel('excel1.xlsx')
df2 = pd.read_excel('excel2.xlsx')

difference = df1[df1!=df2]
print difference

And the result will look like this:

结果将如下所示:

enter image description here

在此处输入图片说明

回答by sharinganSawant

I use a code for doing something similar. It's a bit generalized works well. Input Excel Sheets and expected Output Dataframe image

我使用代码来做类似的事情。它有点泛化效果很好。 输入 Excel 表格和预期的输出数据帧图像

import pandas as pd
import numpy as np
from xlsxwriter.utility import xl_rowcol_to_cell

template = pd.read_excel("template.xlsx",na_values=np.nan,header=None)
testSheet = pd.read_excel("test.xlsx",na_values=np.nan,header=None)

rt,ct = template.shape
rtest,ctest = testSheet.shape

df = pd.DataFrame(columns=['Cell_Location','BaseTemplate_Value','CurrentFile_Value'])

for rowNo in range(max(rt,rtest)):
  for colNo in range(max(ct,ctest)):
    # Fetching the template value at a cell
    try:
        template_val = template.iloc[rowNo,colNo]
    except:
        template_val = np.nan

    # Fetching the testsheet value at a cell
    try:
        testSheet_val = testSheet.iloc[rowNo,colNo]
    except:
        testSheet_val = np.nan

    # Comparing the values
    if (str(template_val)!=str(testSheet_val)):
        cell = xl_rowcol_to_cell(rowNo, colNo)
        dfTemp = pd.DataFrame([[cell,template_val,testSheet_val]],
                              columns=['Cell_Location','BaseTemplate_Value','CurrentFile_Value'])
        df = df.append(dfTemp)

df is the required dataframe

df 是所需的数据框