Python Pandas:使用合并单元格读取 Excel

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22937650/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:00:56  来源:igfitidea点击:

Pandas: Reading Excel with merged cells

pythonexcelpandas

提问by iayork

I have Excel files with multiple sheets, each of which looks a little like this (but much longer):

我有多个工作表的 Excel 文件,每个工作表看起来有点像这样(但要长得多):

        Sample  CD4     CD8
Day 1   8311    17.3    6.44
        8312    13.6    3.50
        8321    19.8    5.88
        8322    13.5    4.09
Day 2   8311    16.0    4.92
        8312    5.67    2.28
        8321    13.0    4.34
        8322    10.6    1.95

The first column is actually four cells merged vertically.

第一列实际上是垂直合并的四个单元格。

When I read this using pandas.read_excel, I get a DataFrame that looks like this:

当我使用 pandas.read_excel 阅读本文时,我得到一个如下所示的 DataFrame:

       Sample    CD4   CD8
Day 1    8311  17.30  6.44
NaN      8312  13.60  3.50
NaN      8321  19.80  5.88
NaN      8322  13.50  4.09
Day 2    8311  16.00  4.92
NaN      8312   5.67  2.28
NaN      8321  13.00  4.34
NaN      8322  10.60  1.95

How can I either get Pandas to understand merged cells, or quickly and easily remove the NaN and group by the appropriate value? (One approach would be to reset the index, step through to find the values and replace NaNs with values, pass in the list of days, then set the index to the column. But it seems like there should be a simpler approach.)

我怎样才能让 Pandas 理解合并的单元格,或者快速轻松地删除 NaN 并按适当的值分组?(一种方法是重置索引,逐步查找值并用值替换 NaN,传入天数列表,然后将索引设置为列。但似乎应该有一种更简单的方法。)

采纳答案by unutbu

You could use the Series.fillnamethod to forword-fill in the NaN values:

您可以使用Series.fillna方法来填充 NaN 值:

df.index = pd.Series(df.index).fillna(method='ffill')


For example,

例如,

In [42]: df
Out[42]: 
       Sample    CD4   CD8
Day 1    8311  17.30  6.44
NaN      8312  13.60  3.50
NaN      8321  19.80  5.88
NaN      8322  13.50  4.09
Day 2    8311  16.00  4.92
NaN      8312   5.67  2.28
NaN      8321  13.00  4.34
NaN      8322  10.60  1.95

[8 rows x 3 columns]

In [43]: df.index = pd.Series(df.index).fillna(method='ffill')

In [44]: df
Out[44]: 
       Sample    CD4   CD8
Day 1    8311  17.30  6.44
Day 1    8312  13.60  3.50
Day 1    8321  19.80  5.88
Day 1    8322  13.50  4.09
Day 2    8311  16.00  4.92
Day 2    8312   5.67  2.28
Day 2    8321  13.00  4.34
Day 2    8322  10.60  1.95

[8 rows x 3 columns]

回答by Muth

df = df.fillna(method='ffill', axis=0)  # resolved updating the missing row entries