Python循环通过Excel工作表,放入一个df

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44549110/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:07:52  来源:igfitidea点击:

Python Loop through Excel sheets, place into one df

pythonexcelpandas

提问by Yolo_chicken

I have an excel file foo.xlsxwith about 40 sheets sh1, sh2, etc. Each sheet has the format:

我有一个Excel文件foo.xlsx与大约40个片材sh1sh2等。每个片材的格式是:

area      cnt   name\nparty1   name\nparty2
blah      9         5               5
word      3         7               5

In each sheet I want to rename the vars with the format name\npartyto only have the partyas a label. Example output:

在每个工作表中,我想用格式重命名变量name\nparty以仅将其party作为标签。示例输出:

area      cnt    party1    party2     sheet
bacon     9         5         5        sh1
spam      3         7         5        sh1
eggs      2         18        4        sh2

I am reading in the file with:

我正在阅读文件:

book = pd.ExcelFile(path) 

And then wondering if I need to do:

然后想知道我是否需要这样做:

for f in filelist:
    df = pd.ExcelFile.parse(book,sheetname=??)
    'more operations here'
    # only change column names 2 and 3
     i, col in enumerate(df):
     if i>=2 and i<=3:
        new_col_name = col.split("\n")[-1]
        df[new_col_name] =

Or something like that?

或类似的东西?

回答by asongtoruin

UPDATE as of 2019-09-09:

截至 2019-09-09 的更新:

use sheet_namefor v0.25.1 instead of sheetname

使用sheet_name了v0.25.1代替sheetname



The read_excelmethod of pandaslets you read all sheets in at once if you set the keyword parameter sheetname=None. This returns a dictionary - the keys are the sheet names, and the values are the sheets as dataframes.

如果您设置关键字参数 ,则该read_excel方法pandas可让您一次读取所有工作表sheetname=None。这将返回一个字典 - 键是工作表名称,值是作为数据框的工作表。

Using this, we can simply loop through the dictionary and:

使用它,我们可以简单地遍历字典并:

  1. Add an extra column to the dataframes containing the relevant sheetname
  2. Use the renamemethod to rename our columns - by using a lambda, we simply take the final entry of the list obtained by splitting each column name any time there is a new line. If there is no new line, the column name is unchanged.
  3. Append to the "full table"
  1. 向包含相关工作表名称的数据框添加额外的列
  2. 使用rename方法重命名我们的列 - 通过使用 a lambda,只要有新行,我们只需通过拆分每个列名获得列表的最后一个条目。如果没有新行,列名不变。
  3. 附加到“全表”

Once this is done, we reset the index and all should be well. Note: if you have parties present on one sheet but not others, this will still work but will fill any missing columns for each sheet with NaN.

完成此操作后,我们重置索引,一切都应该没问题。注意:如果您在一张纸上有派对,但在其他纸上没有,这仍然有效,但会用NaN.

import pandas as pd

sheets_dict = pd.read_excel('Book1.xlsx', sheetname=None)

full_table = pd.DataFrame()
for name, sheet in sheets_dict.items():
    sheet['sheet'] = name
    sheet = sheet.rename(columns=lambda x: x.split('\n')[-1])
    full_table = full_table.append(sheet)

full_table.reset_index(inplace=True, drop=True)

print full_table

Prints:

印刷:

    area  cnt  party1  party2   sheet
0  bacon    9       5       5  Sheet1
1   spam    3       7       5  Sheet1
2   eggs    2      18       4  Sheet2