pandas 如何在熊猫中用滚动平均值填充nan值

Question

提问by VaM999

I have a dataframe which contains nan values at few places. I am trying to perform data cleaning in which I fill the nan values with mean of it's previous five instances. To do so, I have come up with the following.

我有一个数据框，它在几个地方包含 nan 值。我正在尝试执行数据清理，其中我用前五个实例的平均值填充 nan 值。为此，我提出了以下建议。

input_data_frame[var_list].fillna(input_data_frame[var_list].rolling(5).mean(), inplace=True)

But, this is not working. It isn't filling the nan values. There is no change in the dataframe's null count before and after the above operation. Assuming I have a dataframe with just integer column, How can I fill NaN values with mean of the previous five instances? Thanks in advance.

但是，这是行不通的。它没有填充 nan 值。在上述操作之前和之后，数据帧的空计数没有变化。假设我有一个只有整数列的数据框，如何用前五个实例的平均值填充 NaN 值？提前致谢。

Answer 1

采纳答案by Joe

This should work:

这应该有效：

input_data_frame[var_list]= input_data_frame[var_list].fillna(pd.rolling_mean(input_data_frame[var_list], 6, min_periods=1))

Note that the windowis 6because it includes the value of NaNitself (which is not counted in the average). Also the other NaNvalues are not used for the averages, so if less that 5 values are found in the window, the average is calculated on the actual values.

请注意，window是6因为它包括了NaN自身的值（不计入平均值）。此外，其他NaN值不用于平均值，因此如果在窗口中找到少于 5 个值，则根据实际值计算平均值。

Example:

例子：

df = {'a': [1, 1,2,3,4,5, np.nan, 1, 1, 2, 3, 4, 5, np.nan] }
df = pd.DataFrame(data=df)
print df

      a
0   1.0
1   1.0
2   2.0
3   3.0
4   4.0
5   5.0
6   NaN
7   1.0
8   1.0
9   2.0
10  3.0
11  4.0
12  5.0
13  NaN

Output:

输出：

Answer 2

回答by Caner Erden

rolling_meanfunction has been modified in pandas. If you fill the entire dataset, you can use;

rolling_mean功能已在Pandas中进行了修改。如果填充整个数据集，则可以使用；

filled_dataset = dataset.fillna(dataset.rolling(6,min_periods=1).mean())

pandas 如何在熊猫中用滚动平均值填充nan值

提问by VaM999

采纳答案by Joe

回答by Caner Erden

相关推荐

最近更新

标签

pandas 如何在熊猫中用滚动平均值填充nan值

提问by VaM999

采纳答案by Joe

回答by Caner Erden

相关推荐

Python & Pandas - 按天分组并计算每一天

Pandas 导入错误：模块“bottleneck”没有属性“__version__”

pandas 将数据框作为电子邮件正文python中的表格发送？

Python Pandas NameError：未定义名称“数据”

相关推荐

最近更新

标签

Pandas 导入错误：模块“bottleneck”没有属性“version”