pandas 熊猫滚动给出 NaN

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40814201/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:30:34  来源:igfitidea点击:

Pandas rolling gives NaN

pythonpandas

提问by Huey

I'm looking at the tutorials on window functions, but I don't quite understand why the following code produces NaNs.

我正在查看有关窗口函数的教程,但我不太明白为什么以下代码会产生 NaN。

If I understand correctly, the code creates a rolling window of size 2. Why do the first, fourth, and fifth rows have NaN? At first, I thought it's because adding NaN with another number would produce NaN, but then I'm not sure why the second row wouldn't be NaN.

如果我理解正确,代码会创建一个大小为 2 的滚动窗口。为什么第一行、第四行和第五行都有 NaN?起初,我认为这是因为将 NaN 与另一个数字相加会产生 NaN,但后来我不确定为什么第二行不会是 NaN。

dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]}, 
                   index=pd.date_range('20130101 09:00:00', periods=5, freq='s'))


In [58]: dft.rolling(2).sum()
Out[58]: 
                       B
2013-01-01 09:00:00  NaN
2013-01-01 09:00:01  1.0
2013-01-01 09:00:02  3.0
2013-01-01 09:00:03  NaN
2013-01-01 09:00:04  NaN

回答by Brian Huey

The first thing to notice is that by default rollinglooks for n-1 prior rows of data to aggregate, where n is the window size. If that condition is not met, it will return NaN for the window. This is what's happening at the first row. In the fourth and fifth row, it's because one of the values in the sum is NaN.

首先要注意的是,默认情况下会rolling查找要聚合的 n-1 个先前数据行,其中 n 是窗口大小。如果不满足该条件,它将为窗口返回 NaN。这就是第一行发生的情况。在第四行和第五行,这是因为总和中的一个值是 NaN。

If you would like to avoid returning NaN, you could pass min_periods=1to the method which reduces the minimum required number of valid observations in the window to 1 instead of 2:

如果您想避免返回 NaN,您可以传递min_periods=1给将窗口中所需的最小有效观察数减少到 1 而不是 2 的方法:

>>> dft.rolling(2, min_periods=1).sum()
                       B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:01  1.0
2013-01-01 09:00:02  3.0
2013-01-01 09:00:03  2.0
2013-01-01 09:00:04  4.0

回答by John Zwinck

Indeed adding NAN and anything else gives NAN. So:

确实添加 NAN 和其他任何东西都会给 NAN。所以:

input + rolled = sum
    0      nan   nan
    1        0     1
    2        1     3
  nan        2   nan
    4      nan   nan

There's no reason for the second row to be NAN, because it's the sum of the original first and second elements, neither of which is NAN.

第二行没有理由是 NAN,因为它是原始第一个和第二个元素的总和,这两个都不是 NAN。

Another way to do it is:

另一种方法是:

dft.B + dft.B.shift()