如何使用python连接三个excels文件xlsx?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15793349/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to concatenate three excels files xlsx using python?
提问by Auré Vat
Hello I would like to concatenate three excels files xlsx using python.
您好,我想使用 python 连接三个 excels 文件 xlsx。
I have tried using openpyxl, but I don't know which function could help me to append three worksheet into one.
我曾尝试使用 openpyxl,但我不知道哪个函数可以帮助我将三个工作表附加到一个中。
Do you have any ideas how to do that ?
你有什么想法如何做到这一点吗?
Thanks a lot
非常感谢
回答by Henry Keiter
I'd use xlrdand xlwt. Assuming you literally just need to append these files (rather than doing any real work on them), I'd do something like: Open up a file to write to with xlwt, and then for each of your other three files, loop over the data and add each row to the output file. To get you started:
我会使用xlrd和xlwt。假设您实际上只需要附加这些文件(而不是对它们进行任何实际工作),我会执行以下操作:打开一个要写入的文件 with xlwt,然后对于其他三个文件中的每一个,循环遍历数据并将每一行添加到输出文件中。让您开始:
import xlwt
import xlrd
wkbk = xlwt.Workbook()
outsheet = wkbk.add_sheet('Sheet1')
xlsfiles = [r'C:\foo.xlsx', r'C:\bar.xlsx', r'C:\baz.xlsx']
outrow_idx = 0
for f in xlsfiles:
# This is all untested; essentially just pseudocode for concept!
insheet = xlrd.open_workbook(f).sheets()[0]
for row_idx in xrange(insheet.nrows):
for col_idx in xrange(insheet.ncols):
outsheet.write(outrow_idx, col_idx,
insheet.cell_value(row_idx, col_idx))
outrow_idx += 1
wkbk.save(r'C:\combined.xls')
If your files allhave a header line, you probably don't want to repeat that, so you could modify the code above to look more like this:
如果你的文件全部有标题行,你可能不想重复,所以你可以修改上面看起来更像这样的代码:
firstfile = True # Is this the first sheet?
for f in xlsfiles:
insheet = xlrd.open_workbook(f).sheets()[0]
for row_idx in xrange(0 if firstfile else 1, insheet.nrows):
pass # processing; etc
firstfile = False # We're done with the first sheet.
回答by DSM
Here's a pandas-based approach. (It's using openpyxlbehind the scenes.)
这是一种基于熊猫的方法。(它openpyxl在幕后使用。)
import pandas as pd
# filenames
excel_names = ["xlsx1.xlsx", "xlsx2.xlsx", "xlsx3.xlsx"]
# read them in
excels = [pd.ExcelFile(name) for name in excel_names]
# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]
# delete the first row for all frames except the first
# i.e. remove the header row -- assumes it's the first
frames[1:] = [df[1:] for df in frames[1:]]
# concatenate them..
combined = pd.concat(frames)
# write it out
combined.to_excel("c.xlsx", header=False, index=False)
回答by pbarill
Solution with openpyxlonly (without a bunch of other dependencies).
解决方案openpyxl只有(没有一堆其他依赖项)。
This script should take care of merging together an arbitrary number of xlsx documents, whether they have one or multiple sheets. It will preserve the formatting.
此脚本应负责将任意数量的 xlsx 文档合并在一起,无论它们是一张还是多张纸。它将保留格式。
There's a function to copy sheets in openpyxl, but it is only from/to the same file. There's also a function insert_rows somewhere, but by itself it won't insert any rows. So I'm afraid we are left to deal (tediously) with one cell at a time.
在 openpyxl 中有一个复制工作表的功能,但它只能从/复制到同一个文件。还有一个函数 insert_rows 某处,但它本身不会插入任何行。所以恐怕我们只能一次(乏味地)处理一个单元格。
As much as I dislike using forloops and would rather use something compact and elegant like list comprehension, I don't see how to do that here as this is a side-effect show.
尽管我不喜欢使用for循环,而更愿意使用像列表理解这样紧凑而优雅的东西,但我不知道如何在这里做到这一点,因为这是一个副作用展示。
Credit to this answeron copying between workbooks.
信贷这个答案在工作簿之间复制。
#!/usr/bin/env python3
#USAGE
#mergeXLSX.py <a bunch of .xlsx files> ... output.xlsx
#
#where output.xlsx is the unified file
#This works FROM/TO the xlsx format. Libreoffice might help to convert from xls.
#localc --headless --convert-to xlsx somefile.xls
import sys
from copy import copy
from openpyxl import load_workbook,Workbook
def createNewWorkbook(manyWb):
for wb in manyWb:
for sheetName in wb.sheetnames:
o = theOne.create_sheet(sheetName)
safeTitle = o.title
copySheet(wb[sheetName],theOne[safeTitle])
def copySheet(sourceSheet,newSheet):
for row in sourceSheet.rows:
for cell in row:
newCell = newSheet.cell(row=cell.row, column=cell.col_idx,
value= cell.value)
if cell.has_style:
newCell.font = copy(cell.font)
newCell.border = copy(cell.border)
newCell.fill = copy(cell.fill)
newCell.number_format = copy(cell.number_format)
newCell.protection = copy(cell.protection)
newCell.alignment = copy(cell.alignment)
filesInput = sys.argv[1:]
theOneFile = filesInput.pop(-1)
myfriends = [ load_workbook(f) for f in filesInput ]
#try this if you are bored
#myfriends = [ openpyxl.load_workbook(f) for k in range(200) for f in filesInput ]
theOne = Workbook()
del theOne['Sheet'] #We want our new book to be empty. Thanks.
createNewWorkbook(myfriends)
theOne.save(theOneFile)
Tested with openpyxl 2.5.4, python 3.4.
使用 openpyxl 2.5.4、python 3.4 进行测试。
回答by francisedward
When I combine excel files (mydata1.xlsx, mydata2.xlsx, mydata3.xlsx) for data analysis, here is what I do:
当我结合 excel 文件(mydata1.xlsx、mydata2.xlsx、mydata3.xlsx)进行数据分析时,我是这样做的:
import pandas as pd
import numpy as np
import glob
all_data = pd.DataFrame()
for f in glob.glob('myfolder/mydata*.xlsx'):
df = pd.read_excel(f)
all_data = all_data.append(df, ignore_index=True)
Then, when I want to save it as one file:
然后,当我想将其另存为一个文件时:
writer = pd.ExcelWriter('mycollected_data.xlsx', engine='xlsxwriter')
all_data.to_excel(writer, sheet_name='Sheet1')
writer.save()
回答by Dhruv Kadia
You can simply use pandas and os library to do this.
您可以简单地使用 pandas 和 os 库来执行此操作。
import pandas as pd
import os
#create an empty dataframe which will have all the combined data
mergedData = pd.DataFrame()
for files in os.listdir():
#make sure you are only reading excel files
if files.endswith('.xlsx'):
data = pd.read_excel(files, index_col=None)
mergedData = mergedData.append(data)
#move the files to other folder so that it does not process multiple times
os.rename(files, 'path to some other folder')
mergedData DF will have all the combined data which you can export in a separate excel or csv file. Same code will work with csv files as well. just replace it in the IF condition
合并数据 DF 将包含您可以在单独的 excel 或 csv 文件中导出的所有组合数据。相同的代码也适用于 csv 文件。只需在 IF 条件下替换它
回答by Scott Weaver
Just to add to p_barill's answer, if you have custom column widths that you need to copy, you can add the following to the bottom of copySheet:
只是添加到 p_barill 的答案中,如果您有需要复制的自定义列宽,您可以将以下内容添加到 copySheet 的底部:
for col in sourceSheet.column_dimensions:
newSheet.column_dimensions[col] = sourceSheet.column_dimensions[col]
I would just post this in a comment on his or her answer but my reputation isn't high enough.
我只想在他或她的回答的评论中发布这个,但我的声誉还不够高。

