pandas 如何连接同一个文件中的多个excel表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46605910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to concatenate multiple excel sheets from the same file?
提问by ??????
I have a big excel file that contains many different sheets. All the sheets have the same structure like:
我有一个包含许多不同工作表的大 excel 文件。所有工作表都具有相同的结构,例如:
Name
col1 col2 col3 col4
1 1 2 4
4 3 2 1
- How can I concatenate (vertically) all these sheets in
Pandas
without having to name each of them manually? If these were files, I could useglob
to obtain a list of files in a directory. But here, for excel sheets, I am lost. - Is there a way to create a variable in the resulting dataframe that identifies the sheet name from which the data comes from?
- 我怎样才能(垂直)连接所有这些工作表
Pandas
而不必手动命名它们?如果这些是文件,我可以glob
用来获取目录中的文件列表。但是在这里,对于excel表,我迷路了。 - 有没有办法在结果数据框中创建一个变量来标识数据来自的工作表名称?
Thanks!
谢谢!
回答by MaxU
Try this:
尝试这个:
dfs = pd.read_excel(filename, sheetname=None, skiprows=1)
this will return you a dictionary of DFs, which you can easily concatenate using pd.concat(dfs)
or as @jezrael has already posted in his answer:
这将为您返回一个 DF 字典,您可以轻松地使用pd.concat(dfs)
或如@jezrael 已经在他的回答中发布的那样进行连接:
df = pd.concat(pd.read_excel(filename, sheetname=None, skiprows=1))
sheetname: None -> All sheets as a dictionary of DataFrames
sheetname: None -> 所有工作表作为 DataFrames 的字典
UPDATE:
更新:
Is there a way to create a variable in the resulting dataframe that identifies the sheet name from which the data comes from?
有没有办法在结果数据框中创建一个变量来标识数据来自的工作表名称?
dfs = pd.read_excel(filename, sheetname=None, skiprows=1)
assuming we've got the following dict:
假设我们有以下字典:
In [76]: dfs
Out[76]:
{'d1': col1 col2 col3 col4
0 1 1 2 4
1 4 3 2 1, 'd2': col1 col2 col3 col4
0 3 3 4 6
1 6 5 4 3}
Now we can add a new column:
现在我们可以添加一个新列:
In [77]: pd.concat([df.assign(name=n) for n,df in dfs.items()])
Out[77]:
col1 col2 col3 col4 name
0 1 1 2 4 d1
1 4 3 2 1 d1
0 3 3 4 6 d2
1 6 5 4 3 d2
回答by jezrael
First add parameter sheetname=None
for dict
of DataFrames
and skiprows=1
for omit first row and then use concat
for MultiIndex DataFrame
.
首先添加参数sheetname=None
for dict
ofDataFrames
和skiprows=1
for 省略第一行,然后使用concat
for MultiIndex DataFrame
。
Last use reset_index
for column from first level:
reset_index
第一级列的最后一次使用:
df = pd.concat(pd.read_excel('multiple_sheets.xlsx', sheetname=None, skiprows=1))
df = df.reset_index(level=1, drop=True).rename_axis('filenames').reset_index()
回答by blacksite
Taking a note from this question:
从这个问题中记下:
import pandas as pd
file = pd.ExcelFile('file.xlsx')
names = file.sheet_names # see all sheet names
df = pd.concat([file.parse(name) for name in names])
Results:
结果:
df
Out[6]:
A B
0 1 3
1 2 4
0 5 6
1 7 8
Then you can run df.reset_index()
, to, well, reset the index.
然后您可以运行df.reset_index()
, 以重置索引。
Edit: pandas.ExcelFile.parse
is, according to the pandas docs:
编辑:pandas.ExcelFile.parse
是,根据Pandas文档:
Equivalent to read_excel(ExcelFile, ...) See the read_excel docstring for more info on accepted parameters
等效于 read_excel(ExcelFile, ...) 有关可接受参数的更多信息,请参阅 read_excel 文档字符串
回答by malathivenkatesan
file_save_location='myfolder'
file_name='filename'
location = ''myfolder1'
os.chdir(location)
files_xls = glob.glob("*.xls*")
excel_names=[f for f in files_xls]
sheets = pd.ExcelFile(files_xls[0]).sheet_names
def combine_excel_to_dfs(excel_names, sheet_name):
sheet_frames = [pd.read_excel(x, sheet_name=sheet_name) for x in excel_names]
combined_df = pd.concat(sheet_frames).reset_index(drop=True)
return combined_df
i = 0
while i < len(sheets):
process = sheets[i]
consolidated_file= combine_excel_to_dfs(excel_names, process)
consolidated_file.to_csv(file_save_location+file_name+'.csv')
i = i+1
else:
"we done on consolidation part"