pandas 熊猫滚动给出 NaN
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40814201/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas rolling gives NaN
提问by Huey
I'm looking at the tutorials on window functions, but I don't quite understand why the following code produces NaNs.
我正在查看有关窗口函数的教程,但我不太明白为什么以下代码会产生 NaN。
If I understand correctly, the code creates a rolling window of size 2. Why do the first, fourth, and fifth rows have NaN? At first, I thought it's because adding NaN with another number would produce NaN, but then I'm not sure why the second row wouldn't be NaN.
如果我理解正确,代码会创建一个大小为 2 的滚动窗口。为什么第一行、第四行和第五行都有 NaN?起初,我认为这是因为将 NaN 与另一个数字相加会产生 NaN,但后来我不确定为什么第二行不会是 NaN。
dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
index=pd.date_range('20130101 09:00:00', periods=5, freq='s'))
In [58]: dft.rolling(2).sum()
Out[58]:
B
2013-01-01 09:00:00 NaN
2013-01-01 09:00:01 1.0
2013-01-01 09:00:02 3.0
2013-01-01 09:00:03 NaN
2013-01-01 09:00:04 NaN
回答by Brian Huey
The first thing to notice is that by default rollinglooks for n-1 prior rows of data to aggregate, where n is the window size. If that condition is not met, it will return NaN for the window. This is what's happening at the first row. In the fourth and fifth row, it's because one of the values in the sum is NaN.
首先要注意的是,默认情况下会rolling查找要聚合的 n-1 个先前数据行,其中 n 是窗口大小。如果不满足该条件,它将为窗口返回 NaN。这就是第一行发生的情况。在第四行和第五行,这是因为总和中的一个值是 NaN。
If you would like to avoid returning NaN, you could pass min_periods=1to the method which reduces the minimum required number of valid observations in the window to 1 instead of 2:
如果您想避免返回 NaN,您可以传递min_periods=1给将窗口中所需的最小有效观察数减少到 1 而不是 2 的方法:
>>> dft.rolling(2, min_periods=1).sum()
B
2013-01-01 09:00:00 0.0
2013-01-01 09:00:01 1.0
2013-01-01 09:00:02 3.0
2013-01-01 09:00:03 2.0
2013-01-01 09:00:04 4.0
回答by John Zwinck
Indeed adding NAN and anything else gives NAN. So:
确实添加 NAN 和其他任何东西都会给 NAN。所以:
input + rolled = sum
0 nan nan
1 0 1
2 1 3
nan 2 nan
4 nan nan
There's no reason for the second row to be NAN, because it's the sum of the original first and second elements, neither of which is NAN.
第二行没有理由是 NAN,因为它是原始第一个和第二个元素的总和,这两个都不是 NAN。
Another way to do it is:
另一种方法是:
dft.B + dft.B.shift()

