如何解决与 Series.fillna() 相关的 Pandas 问题？

Question

提问by bigbug

I just upgrade from Pandas 0.11 to 0.13.0rc1. The upgration caused one error related to Series.fillna().

我只是从 Pandas 0.11 升级到 0.13.0rc1。升级导致了一个与 Series.fillna() 相关的错误。

>>> df
                   sales  net_pft
STK_ID RPT_Date                  
600809 20060331   5.8951   1.1241
       20060630   8.3031   1.5464
       20060930  11.9084   2.2990
       20061231      NaN   2.6060
       20070331   5.9129   1.3334

[5 rows x 2 columns]
>>> type(df['sales'])
<class 'pandas.core.series.Series'>
>>> df['sales'] = df['sales'].fillna(df['net_pft'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python27\lib\site-packages\pandas\core\generic.py", line 1912, in fillna
    obj.fillna(v, inplace=True)
AttributeError: 'numpy.float64' object has no attribute 'fillna'
>>>

Why df['sales']become 'numpy.float64'object when it is used in fillna()? How to correctly do "fill the NaN of one column with the other column's value" ?

为什么在使用时df['sales']成为'numpy.float64'对象fillna()？如何正确地“用另一列的值填充一列的 NaN”？

Answer 1

采纳答案by joris

There was a recent discussion on this, and it is fixed in pandas master: https://github.com/pydata/pandas/issues/5703(after the release of 0.13rc1, so it will be fixed in final 0.13).

最近有一个讨论，已经在pandas master中修复了：https: //github.com/pydata/pandas/issues/5703（0.13rc1发布后，所以会在final 0.13中修复）。

Note: the behaviour changed! This was not supported behaviour in pandas <= 0.12, as @behzad.nouri points out (using a Series as input to fillna). However it did work but was apparantly based on the location, which was wrong. But as long as both serieses (df['sales']and df['net_pft']in you case) have the same index, this will not matter.
In pandas 0.13, it will be supported but based on the index of the Series. See comment here: https://github.com/pydata/pandas/issues/5703#issuecomment-30663525

注意：行为改变了！正如@behzad.nouri 指出的那样（使用 Series 作为的输入fillna），这在 pandas <= 0.12 中不受支持。然而，它确实有效，但显然是基于位置，这是错误的。但只要双方个系列（df['sales']和df['net_pft']你的情况下）具有相同的索引，因此关系不大。
在 pandas 0.13 中，它将被支持，但基于系列的索引。请参阅此处的评论：https: //github.com/pydata/pandas/issues/5703#issuecomment-30663525

Answer 2

回答by behzad.nouri

it seems more like what you are trying to do is:

看起来更像是你想要做的是：

idx = df['sales'].isnull( )
df['sales'][ idx ] = df['net_pft'][ idx ]

because what you are providing as valueargument to fillnais a series, the code goes into the bellow branch which calls fillnafor every index item of the provided series. If selfwas a DataFrame this would have worked correctly, that is it would fillnaeach column using the provided series, but since selfhere is a Series it will break.

因为您提供的value参数fillna是一个系列，所以代码进入波纹管分支，该分支调用fillna所提供系列的每个索引项。如果self是一个 DataFrame 这将正常工作，也就是说它会fillna使用提供的系列的每一列，但由于self这里是一个系列，它会中断。

As in the documentationto fillnaa DataFrame the parameter value can be

在DataFrame的文档中，fillna参数值可以是

alternately a dict of values specifying which value to use for each column (columns not in the dict will not be filled).

或者是一个值字典，指定用于每列的值（不在字典中的列将不会被填充）。

from the source code below, if valueis a Series it will work the same way as a dict using the Series' index as keys to fillnacorresponding columns.

从下面的源代码中，如果value是一个系列，它将以与使用系列索引作为fillna对应列的键的字典相同的方式工作。

    else:   # value is not None
        if method is not None:
            raise ValueError('cannot specify both a fill method and value')

        if len(self._get_axis(axis)) == 0:
            return self
        if isinstance(value, (dict, com.ABCSeries)):
            if axis == 1:
                raise NotImplementedError('Currently only can fill '
                                          'with dict/Series column '
                                          'by column')

            result = self if inplace else self.copy()
            for k, v in compat.iteritems(value):
                if k not in result:
                    continue
                obj = result[k]
                obj.fillna(v, inplace=True)
            return result
        else:
            new_data = self._data.fillna(value, inplace=inplace,
                                         downcast=downcast)

如何解决与 Series.fillna() 相关的 Pandas 问题？

提问by bigbug

采纳答案by joris

回答by behzad.nouri

相关推荐

最近更新

标签

如何解决与 Series.fillna() 相关的 Pandas 问题？

提问by bigbug

采纳答案by joris

回答by behzad.nouri

相关推荐

pandas 从午夜以外的时间开始重新采样每日熊猫时间序列

pandas 填充熊猫中缺失的索引

使用 sqlalchemy、mysql 和 pandas 读取框架

python pandas 3 个最小值和 3 个最大值

相关推荐

最近更新

标签