pandas 如何连接同一个文件中的多个excel表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46605910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:35:43  来源:igfitidea点击:

how to concatenate multiple excel sheets from the same file?

pythonexcelpandasdataframe

提问by ??????

I have a big excel file that contains many different sheets. All the sheets have the same structure like:

我有一个包含许多不同工作表的大 excel 文件。所有工作表都具有相同的结构,例如:

Name
col1  col2  col3  col4
1     1     2     4
4     3     2     1
  • How can I concatenate (vertically) all these sheets in Pandaswithout having to name each of them manually? If these were files, I could use globto obtain a list of files in a directory. But here, for excel sheets, I am lost.
  • Is there a way to create a variable in the resulting dataframe that identifies the sheet name from which the data comes from?
  • 我怎样才能(垂直)连接所有这些工作表Pandas而不必手动命名它们?如果这些是文件,我可以glob用来获取目录中的文件列表。但是在这里,对于excel表,我迷路了。
  • 有没有办法在结果数据框中创建一个变量来标识数据来自的工作表名称?

Thanks!

谢谢!

回答by MaxU

Try this:

尝试这个:

dfs = pd.read_excel(filename, sheetname=None, skiprows=1)

this will return you a dictionary of DFs, which you can easily concatenate using pd.concat(dfs)or as @jezrael has already posted in his answer:

这将为您返回一个 DF 字典,您可以轻松地使用pd.concat(dfs)或如@jezrael 已经在他的回答中发布的那样进行连接:

df = pd.concat(pd.read_excel(filename, sheetname=None, skiprows=1))

sheetname: None -> All sheets as a dictionary of DataFrames

sheetname: None -> 所有工作表作为 DataFrames 的字典

UPDATE:

更新:

Is there a way to create a variable in the resulting dataframe that identifies the sheet name from which the data comes from?

有没有办法在结果数据框中创建一个变量来标识数据来自的工作表名称?

dfs = pd.read_excel(filename, sheetname=None, skiprows=1)

assuming we've got the following dict:

假设我们有以下字典:

In [76]: dfs
Out[76]:
{'d1':    col1  col2  col3  col4
 0     1     1     2     4
 1     4     3     2     1, 'd2':    col1  col2  col3  col4
 0     3     3     4     6
 1     6     5     4     3}

Now we can add a new column:

现在我们可以添加一个新列:

In [77]: pd.concat([df.assign(name=n) for n,df in dfs.items()])
Out[77]:
   col1  col2  col3  col4 name
0     1     1     2     4   d1
1     4     3     2     1   d1
0     3     3     4     6   d2
1     6     5     4     3   d2

回答by jezrael

First add parameter sheetname=Nonefor dictof DataFramesand skiprows=1for omit first row and then use concatfor MultiIndex DataFrame.

首先添加参数sheetname=Nonefor dictofDataFramesskiprows=1for 省略第一行,然后使用concatfor MultiIndex DataFrame

Last use reset_indexfor column from first level:

reset_index第一级列的最后一次使用:

df = pd.concat(pd.read_excel('multiple_sheets.xlsx', sheetname=None, skiprows=1))
df = df.reset_index(level=1, drop=True).rename_axis('filenames').reset_index()

回答by blacksite

Taking a note from this question:

这个问题中记下

import pandas as pd

file = pd.ExcelFile('file.xlsx')

names = file.sheet_names  # see all sheet names

df = pd.concat([file.parse(name) for name in names])

Results:

结果:

df
Out[6]: 
   A  B
0  1  3
1  2  4
0  5  6
1  7  8

Then you can run df.reset_index(), to, well, reset the index.

然后您可以运行df.reset_index(), 以重置索引。

Edit: pandas.ExcelFile.parseis, according to the pandas docs:

编辑:pandas.ExcelFile.parse是,根据Pandas文档:

Equivalent to read_excel(ExcelFile, ...) See the read_excel docstring for more info on accepted parameters

等效于 read_excel(ExcelFile, ...) 有关可接受参数的更多信息,请参阅 read_excel 文档字符串

回答by malathivenkatesan

file_save_location='myfolder'                                
file_name='filename'

location = ''myfolder1'
os.chdir(location)
files_xls = glob.glob("*.xls*")
excel_names=[f for f in files_xls]
sheets = pd.ExcelFile(files_xls[0]).sheet_names
def combine_excel_to_dfs(excel_names, sheet_name):
    sheet_frames = [pd.read_excel(x, sheet_name=sheet_name) for x in excel_names]
    combined_df = pd.concat(sheet_frames).reset_index(drop=True)
    return combined_df

i = 0

while i < len(sheets):
    process = sheets[i]
    consolidated_file= combine_excel_to_dfs(excel_names, process)
    consolidated_file.to_csv(file_save_location+file_name+'.csv')
    i = i+1
else:
    "we done on consolidation part"