如何解决与 Series.fillna() 相关的 Pandas 问题?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20633506/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:27:40  来源:igfitidea点击:

How to solve the Pandas issue related to Series.fillna()?

pythonpandas

提问by bigbug

I just upgrade from Pandas 0.11 to 0.13.0rc1. The upgration caused one error related to Series.fillna().

我只是从 Pandas 0.11 升级到 0.13.0rc1。升级导致了一个与 Series.fillna() 相关的错误。

>>> df
                   sales  net_pft
STK_ID RPT_Date                  
600809 20060331   5.8951   1.1241
       20060630   8.3031   1.5464
       20060930  11.9084   2.2990
       20061231      NaN   2.6060
       20070331   5.9129   1.3334

[5 rows x 2 columns]
>>> type(df['sales'])
<class 'pandas.core.series.Series'>
>>> df['sales'] = df['sales'].fillna(df['net_pft'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python27\lib\site-packages\pandas\core\generic.py", line 1912, in fillna
    obj.fillna(v, inplace=True)
AttributeError: 'numpy.float64' object has no attribute 'fillna'
>>> 

Why df['sales']become 'numpy.float64'object when it is used in fillna()? How to correctly do "fill the NaN of one column with the other column's value" ?

为什么在使用时df['sales']成为'numpy.float64'对象fillna()?如何正确地“用另一列的值填充一列的 NaN”?

采纳答案by joris

There was a recent discussion on this, and it is fixed in pandas master: https://github.com/pydata/pandas/issues/5703(after the release of 0.13rc1, so it will be fixed in final 0.13).

最近有一个讨论,已经在pandas master中修复了:https: //github.com/pydata/pandas/issues/5703(0.13rc1发布后,所以会在final 0.13中修复)。

Note: the behaviour changed! This was not supported behaviour in pandas <= 0.12, as @behzad.nouri points out (using a Series as input to fillna). However it did work but was apparantly based on the location, which was wrong. But as long as both serieses (df['sales']and df['net_pft']in you case) have the same index, this will not matter.
In pandas 0.13, it will be supported but based on the index of the Series. See comment here: https://github.com/pydata/pandas/issues/5703#issuecomment-30663525

注意:行为改变了!正如@behzad.nouri 指出的那样(使用 Series 作为 的输入fillna),这在 pandas <= 0.12 中不受支持。然而,它确实有效,但显然是基于位置,这是错误的。但只要双方个系列(df['sales']df['net_pft']你的情况下)具有相同的索引,因此关系不大。
在 pandas 0.13 中,它将被支持,但基于系列的索引。请参阅此处的评论:https: //github.com/pydata/pandas/issues/5703#issuecomment-30663525

回答by behzad.nouri

it seems more like what you are trying to do is:

看起来更像是你想要做的是:

idx = df['sales'].isnull( )
df['sales'][ idx ] = df['net_pft'][ idx ]

because what you are providing as valueargument to fillnais a series, the code goes into the bellow branch which calls fillnafor every index item of the provided series. If selfwas a DataFrame this would have worked correctly, that is it would fillnaeach column using the provided series, but since selfhere is a Series it will break.

因为您提供的value参数fillna是一个系列,所以代码进入波纹管分支,该分支调用fillna所提供系列的每个索引项。如果self是一个 DataFrame 这将正常工作,也就是说它会fillna使用提供的系列的每一列,但由于self这里是一个系列,它会中断。

As in the documentationto fillnaa DataFrame the parameter value can be

在DataFrame的文档中fillna参数值可以是

alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled).

或者是一个值字典,指定用于每列的值(不在字典中的列将不会被填充)。

from the source code below, if valueis a Series it will work the same way as a dict using the Series' index as keys to fillnacorresponding columns.

从下面的源代码中,如果value是一个系列,它将以与使用系列索引作为fillna对应列的键的字典相同的方式工作。

    else:   # value is not None
        if method is not None:
            raise ValueError('cannot specify both a fill method and value')

        if len(self._get_axis(axis)) == 0:
            return self
        if isinstance(value, (dict, com.ABCSeries)):
            if axis == 1:
                raise NotImplementedError('Currently only can fill '
                                          'with dict/Series column '
                                          'by column')

            result = self if inplace else self.copy()
            for k, v in compat.iteritems(value):
                if k not in result:
                    continue
                obj = result[k]
                obj.fillna(v, inplace=True)
            return result
        else:
            new_data = self._data.fillna(value, inplace=inplace,
                                         downcast=downcast)