pandas 使用python将excel工作簿表合二为一
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47743741/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Combining excel workbook sheet into one using python
提问by Jesus
I have roughly 30 excel workbooks I need to combine into one. Each workbook has a variable number of sheets but the sheet I need to combine from each workbook is called "Output" and the format of the columns in this sheet is consistent.
我有大约 30 个 excel 工作簿需要合并为一个。每个工作簿都有可变数量的工作表,但我需要从每个工作簿中合并的工作表称为“输出”,并且该工作表中列的格式是一致的。
I need to import the Output sheet from the first file, then append the remaining files and ignore the header row.
我需要从第一个文件导入输出表,然后附加剩余的文件并忽略标题行。
I have tried to do this using glob/pandas to no avail.
我曾尝试使用 glob/pandas 来做到这一点,但无济于事。
回答by Omni
You could use openpyxl. Here's a sketch of the code:
您可以使用openpyxl。这是代码的草图:
from openpyxl import load_workbook
compiled_wb = load_workbook(filename = 'yourfile1.xlsx')
compiled_ws = compiled['Output']
for i in range(1, 30):
wb = load_workbook(filename = 'yourfile{}.xlsx'.format(i))
ws = wb['Output']
compiled_ws.append(ws.rows()[1:]) # ignore row 0
compiled_wb.save('compiled.xlsx')
回答by manandearth
Method shown by Clinton c. Brownley in Foundations for Analytics with Python:
克林顿 c. 展示的方法 Brownley 在《使用 Python 进行分析的基础》中:
execute in shell indicating the path to the folder with excel files ( make sure the argument defining all_workbooks is correct) and then followed by the excel output file as follows:
在 shell 中执行,指示带有 excel 文件的文件夹的路径(确保定义 all_workbooks 的参数正确),然后是 excel 输出文件,如下所示:
python script.py <the /path/ to/ excel folder/> < your/ final/output.xlsx>
script.py:
脚本.py:
import pandas as pd
import sys
import os
import glob
input_path = sys.argv[1]
output_file = sys.argv[2]
all_workbooks = glob.glob(os.path.join(input_file, '*.xlsx'))
all_df = []
for workbook in all_workbooks:
all_worksheets = pd.read_excel(workbook, sheetname='Output', index_col=None)
for worksheet, data in all_worksheets.items:
all_df.append(data)
data_concatenated = pd.concat(all_df, axis=0, ignore_index=True)
writer = pd.ExcelWriter(output_file)
data_concatenated.to_excel(writer, sheetname='concatenated_Output', index=False)
writer.save()
回答by ASH
This will probably get down-voted because this isn't a Python answer, but honestly, I wouldn't use Python for this kind of task. I think you are far better off installing the AddIn below, and using that for the job.
这可能会被否决,因为这不是 Python 答案,但老实说,我不会将 Python 用于此类任务。我认为您最好安装下面的 AddIn,并将其用于工作。
https://www.rondebruin.nl/win/addins/rdbmerge.htm
https://www.rondebruin.nl/win/addins/rdbmerge.htm
Click 'Merge all files from the folder in the Files location selection' and click 'Use a Worksheet name' = 'Output', and finally, I think you want 'First cell'. Good luck!
单击“合并文件位置选择中文件夹中的所有文件”,然后单击“使用工作表名称”=“输出”,最后,我认为您需要“第一个单元格”。祝你好运!