pandas 如何在熊猫中用滚动平均值填充nan值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49172914/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:17:32  来源:igfitidea点击:

How to fill nan values with rolling mean in pandas

pythonpandasdataframenanmean

提问by VaM999

I have a dataframe which contains nan values at few places. I am trying to perform data cleaning in which I fill the nan values with mean of it's previous five instances. To do so, I have come up with the following.

我有一个数据框,它在几个地方包含 nan 值。我正在尝试执行数据清理,其中我用前五个实例的平均值填充 nan 值。为此,我提出了以下建议。

input_data_frame[var_list].fillna(input_data_frame[var_list].rolling(5).mean(), inplace=True)

But, this is not working. It isn't filling the nan values. There is no change in the dataframe's null count before and after the above operation. Assuming I have a dataframe with just integer column, How can I fill NaN values with mean of the previous five instances? Thanks in advance.

但是,这是行不通的。它没有填充 nan 值。在上述操作之前和之后,数据帧的空计数没有变化。假设我有一个只有整数列的数据框,如何用前五个实例的平均值填充 NaN 值?提前致谢。

采纳答案by Joe

This should work:

这应该有效:

input_data_frame[var_list]= input_data_frame[var_list].fillna(pd.rolling_mean(input_data_frame[var_list], 6, min_periods=1))

Note that the windowis 6because it includes the value of NaNitself (which is not counted in the average). Also the other NaNvalues are not used for the averages, so if less that 5 values are found in the window, the average is calculated on the actual values.

请注意,window6因为它包括了NaN自身的值(不计入平均值)。此外,其他NaN值不用于平均值,因此如果在窗口中找到少于 5 个值,则根据实际值计算平均值。

Example:

例子:

df = {'a': [1, 1,2,3,4,5, np.nan, 1, 1, 2, 3, 4, 5, np.nan] }
df = pd.DataFrame(data=df)
print df

      a
0   1.0
1   1.0
2   2.0
3   3.0
4   4.0
5   5.0
6   NaN
7   1.0
8   1.0
9   2.0
10  3.0
11  4.0
12  5.0
13  NaN

Output:

输出:

      a
0   1.0
1   1.0
2   2.0
3   3.0
4   4.0
5   5.0
6   3.0
7   1.0
8   1.0
9   2.0
10  3.0
11  4.0
12  5.0
13  3.0

回答by Caner Erden

rolling_meanfunction has been modified in pandas. If you fill the entire dataset, you can use;

rolling_mean功能已在Pandas中进行了修改。如果填充整个数据集,则可以使用;

filled_dataset = dataset.fillna(dataset.rolling(6,min_periods=1).mean())