Python循环通过Excel工作表,放入一个df
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44549110/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Loop through Excel sheets, place into one df
提问by Yolo_chicken
I have an excel file foo.xlsx
with about 40 sheets sh1
, sh2
, etc. Each sheet has the format:
我有一个Excel文件foo.xlsx
与大约40个片材sh1
,sh2
等。每个片材的格式是:
area cnt name\nparty1 name\nparty2
blah 9 5 5
word 3 7 5
In each sheet I want to rename the vars with the format name\nparty
to only have the party
as a label. Example output:
在每个工作表中,我想用格式重命名变量name\nparty
以仅将其party
作为标签。示例输出:
area cnt party1 party2 sheet
bacon 9 5 5 sh1
spam 3 7 5 sh1
eggs 2 18 4 sh2
I am reading in the file with:
我正在阅读文件:
book = pd.ExcelFile(path)
And then wondering if I need to do:
然后想知道我是否需要这样做:
for f in filelist:
df = pd.ExcelFile.parse(book,sheetname=??)
'more operations here'
# only change column names 2 and 3
i, col in enumerate(df):
if i>=2 and i<=3:
new_col_name = col.split("\n")[-1]
df[new_col_name] =
Or something like that?
或类似的东西?
回答by asongtoruin
UPDATE as of 2019-09-09:
截至 2019-09-09 的更新:
use sheet_name
for v0.25.1 instead of sheetname
使用sheet_name
了v0.25.1代替sheetname
The read_excel
method of pandas
lets you read all sheets in at once if you set the keyword parameter sheetname=None
. This returns a dictionary - the keys are the sheet names, and the values are the sheets as dataframes.
如果您设置关键字参数 ,则该read_excel
方法pandas
可让您一次读取所有工作表sheetname=None
。这将返回一个字典 - 键是工作表名称,值是作为数据框的工作表。
Using this, we can simply loop through the dictionary and:
使用它,我们可以简单地遍历字典并:
- Add an extra column to the dataframes containing the relevant sheetname
- Use the
rename
method to rename our columns - by using alambda
, we simply take the final entry of the list obtained by splitting each column name any time there is a new line. If there is no new line, the column name is unchanged. - Append to the "full table"
- 向包含相关工作表名称的数据框添加额外的列
- 使用
rename
方法重命名我们的列 - 通过使用 alambda
,只要有新行,我们只需通过拆分每个列名获得列表的最后一个条目。如果没有新行,列名不变。 - 附加到“全表”
Once this is done, we reset the index and all should be well. Note: if you have parties present on one sheet but not others, this will still work but will fill any missing columns for each sheet with NaN
.
完成此操作后,我们重置索引,一切都应该没问题。注意:如果您在一张纸上有派对,但在其他纸上没有,这仍然有效,但会用NaN
.
import pandas as pd
sheets_dict = pd.read_excel('Book1.xlsx', sheetname=None)
full_table = pd.DataFrame()
for name, sheet in sheets_dict.items():
sheet['sheet'] = name
sheet = sheet.rename(columns=lambda x: x.split('\n')[-1])
full_table = full_table.append(sheet)
full_table.reset_index(inplace=True, drop=True)
print full_table
Prints:
印刷:
area cnt party1 party2 sheet
0 bacon 9 5 5 Sheet1
1 spam 3 7 5 Sheet1
2 eggs 2 18 4 Sheet2