pandas 如何连接同一个文件中的多个excel表？

Question

提问by ??????

I have a big excel file that contains many different sheets. All the sheets have the same structure like:

我有一个包含许多不同工作表的大 excel 文件。所有工作表都具有相同的结构，例如：

Name
col1  col2  col3  col4
1     1     2     4
4     3     2     1

How can I concatenate (vertically) all these sheets in Pandaswithout having to name each of them manually? If these were files, I could use globto obtain a list of files in a directory. But here, for excel sheets, I am lost.
Is there a way to create a variable in the resulting dataframe that identifies the sheet name from which the data comes from?

我怎样才能（垂直）连接所有这些工作表Pandas而不必手动命名它们？如果这些是文件，我可以glob用来获取目录中的文件列表。但是在这里，对于excel表，我迷路了。
有没有办法在结果数据框中创建一个变量来标识数据来自的工作表名称？

Thanks!

谢谢！

Answer 1

回答by MaxU

Try this:

尝试这个：

dfs = pd.read_excel(filename, sheetname=None, skiprows=1)

this will return you a dictionary of DFs, which you can easily concatenate using pd.concat(dfs)or as @jezrael has already posted in his answer:

这将为您返回一个 DF 字典，您可以轻松地使用pd.concat(dfs)或如@jezrael 已经在他的回答中发布的那样进行连接：

df = pd.concat(pd.read_excel(filename, sheetname=None, skiprows=1))

sheetname: None -> All sheets as a dictionary of DataFrames

sheetname: None -> 所有工作表作为 DataFrames 的字典

UPDATE:

更新：

Is there a way to create a variable in the resulting dataframe that identifies the sheet name from which the data comes from?

有没有办法在结果数据框中创建一个变量来标识数据来自的工作表名称？

dfs = pd.read_excel(filename, sheetname=None, skiprows=1)

assuming we've got the following dict:

假设我们有以下字典：

In [76]: dfs
Out[76]:
{'d1':    col1  col2  col3  col4
 0     1     1     2     4
 1     4     3     2     1, 'd2':    col1  col2  col3  col4
 0     3     3     4     6
 1     6     5     4     3}

Now we can add a new column:

现在我们可以添加一个新列：

In [77]: pd.concat([df.assign(name=n) for n,df in dfs.items()])
Out[77]:
   col1  col2  col3  col4 name
0     1     1     2     4   d1
1     4     3     2     1   d1
0     3     3     4     6   d2
1     6     5     4     3   d2

Answer 2

回答by jezrael

First add parameter sheetname=Nonefor dictof DataFramesand skiprows=1for omit first row and then use concatfor MultiIndex DataFrame.

首先添加参数sheetname=Nonefor dictofDataFrames和skiprows=1for 省略第一行，然后使用concatfor MultiIndex DataFrame。

Last use reset_indexfor column from first level:

reset_index第一级列的最后一次使用：

df = pd.concat(pd.read_excel('multiple_sheets.xlsx', sheetname=None, skiprows=1))
df = df.reset_index(level=1, drop=True).rename_axis('filenames').reset_index()

Answer 3

回答by blacksite

Taking a note from this question:

从这个问题中记下：

import pandas as pd

file = pd.ExcelFile('file.xlsx')

names = file.sheet_names  # see all sheet names

df = pd.concat([file.parse(name) for name in names])

Results:

结果：

Then you can run df.reset_index(), to, well, reset the index.

然后您可以运行df.reset_index(), 以重置索引。

Edit: pandas.ExcelFile.parseis, according to the pandas docs:

编辑：pandas.ExcelFile.parse是，根据Pandas文档：

Equivalent to read_excel(ExcelFile, ...) See the read_excel docstring for more info on accepted parameters

等效于 read_excel(ExcelFile, ...) 有关可接受参数的更多信息，请参阅 read_excel 文档字符串

Answer 4

回答by malathivenkatesan

file_save_location='myfolder'                                
file_name='filename'

location = ''myfolder1'
os.chdir(location)
files_xls = glob.glob("*.xls*")
excel_names=[f for f in files_xls]
sheets = pd.ExcelFile(files_xls[0]).sheet_names
def combine_excel_to_dfs(excel_names, sheet_name):
    sheet_frames = [pd.read_excel(x, sheet_name=sheet_name) for x in excel_names]
    combined_df = pd.concat(sheet_frames).reset_index(drop=True)
    return combined_df

i = 0

while i < len(sheets):
    process = sheets[i]
    consolidated_file= combine_excel_to_dfs(excel_names, process)
    consolidated_file.to_csv(file_save_location+file_name+'.csv')
    i = i+1
else:
    "we done on consolidation part"

pandas 如何连接同一个文件中的多个excel表？

提问by ??????

回答by MaxU

回答by jezrael

回答by blacksite

回答by malathivenkatesan

相关推荐

最近更新

标签

pandas 如何连接同一个文件中的多个excel表？

提问by ??????

回答by MaxU

回答by jezrael

回答by blacksite

回答by malathivenkatesan

相关推荐

pandas drop_duplicates 在熊猫中不起作用？

pandas 分类特征相关性

从 Pandas/Python 中的选定单元格访问索引/行/列

将 Pandas DatetimeIndex 转换为数字格式

相关推荐

最近更新

标签