Python 使用熊猫组合/合并 2 个不同的 Excel 文件/工作表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25400240/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:10:25  来源:igfitidea点击:

Using pandas Combining/merging 2 different Excel files/sheets

pythonexcelpandas

提问by Mark K

I am trying to combine 2 different Excel files. (thanks to the post Import multiple excel files into python pandas and concatenate them into one dataframe)

我正在尝试合并 2 个不同的 Excel 文件。(感谢将多个 excel 文件导入 python pandas 并将它们连接成一个数据框一文

The one I work out so far is:

到目前为止我工作的一个是:

import os
import pandas as pd

df = pd.DataFrame()

for f in ['c:\file1.xls', 'c:\ file2.xls']:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)

df.to_excel("c:\all.xls")

Here is how they look like.

这是它们的样子。

enter image description here

在此处输入图片说明

However I want to:

但是我想:

  1. Exclude the last rows of each file (i.e. row4 and row5 in File1.xls; row7 and row8 in File2.xls).
  2. Add a column (or overwrite Column A) to indicate where the data from.
  1. 排除每个文件的最后一行(即 File1.xls 中的第 4 行和第 5 行;File2.xls 中的第 7 行和第 8 行)。
  2. 添加一列(或覆盖 A 列)以指示数据的来源。

For example:

例如:

enter image description here

在此处输入图片说明

Is it possible? Thanks.

是否可以?谢谢。

采纳答案by behzad.nouri

For num. 1, you can specify skip_footeras explained here; or, alternatively, do

对于数。1,您可以skip_footer按照此处的说明进行指定;或者,或者,做

data = data.iloc[:-2]

once your read the data.

一旦你阅读了数据。

For num. 2, you may do:

对于数。2、你可以这样做:

from os.path import basename
data.index = [basename(f)] * len(data)

Also, perhaps would be better to put all the data-frames in a list and then concatthem at the end; something like:

此外,也许最好将所有数据帧放在一个列表中,然后将concat它们放在最后;就像是:

df = []
for f in ['c:\file1.xls', 'c:\ file2.xls']:
    data = pd.read_excel(f, 'Sheet1').iloc[:-2]
    data.index = [os.path.basename(f)] * len(data)
    df.append(data)

df = pd.concat(df)

回答by Amit Sharma

import os
import os.path
import xlrd
import xlsxwriter

file_name = input("Decide the destination file name in DOUBLE QUOTES: ")
merged_file_name = file_name + ".xlsx"
dest_book = xlsxwriter.Workbook(merged_file_name)
dest_sheet_1 = dest_book.add_worksheet()
dest_row = 1
temp = 0
path = input("Enter the path in DOUBLE QUOTES: ")
for root,dirs,files in os.walk(path):
    files = [ _ for _ in files if _.endswith('.xlsx') ]
    for xlsfile in files:
        print ("File in mentioned folder is: " + xlsfile)
        temp_book = xlrd.open_workbook(os.path.join(root,xlsfile))
        temp_sheet = temp_book.sheet_by_index(0)
        if temp == 0:
            for col_index in range(temp_sheet.ncols):
                str = temp_sheet.cell_value(0, col_index)
                dest_sheet_1.write(0, col_index, str)
            temp = temp + 1
        for row_index in range(1, temp_sheet.nrows):
            for col_index in range(temp_sheet.ncols):
                str = temp_sheet.cell_value(row_index, col_index)
                dest_sheet_1.write(dest_row, col_index, str)
            dest_row = dest_row + 1
dest_book.close()
book = xlrd.open_workbook(merged_file_name)
sheet = book.sheet_by_index(0)
print "number of rows in destination file are: ", sheet.nrows
print "number of columns in destination file are: ", sheet.ncols