pandas 熊猫在移动的数据帧上滚动
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27479800/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas rolling on a shifted dataframe
提问by euri10
Here's a piece of code, I don't get why on the last column rm-5, I get NaN for the first 4 items.
这是一段代码,我不明白为什么在最后一列 rm-5 上,前 4 项得到 NaN。
I understand that for the rm columns the 1st 4 items aren't filled because there is no data available, but if I shift the column calculation should be made, shouldn't it ?
我知道对于 rm 列,第 1 个 4 项没有填充,因为没有可用的数据,但是如果我移动列计算应该进行,不是吗?
Similarly I don't get why there are 5 and not 4 items in the rm-5 column that are NaN
同样,我不明白为什么 rm-5 列中有 5 个而不是 4 个项目是 NaN
import pandas as pd
import numpy as np
index = pd.date_range('2000-1-1', periods=100, freq='D')
df = pd.DataFrame(data=np.random.randn(100), index=index, columns=['A'])
df['rm']=pd.rolling_mean(df['A'],5)
df['rm-5']=pd.rolling_mean(df['A'].shift(-5),5)
print df.head(n=8)
print df.tail(n=8)
A rm rm-5
2000-01-01 0.109161 NaN NaN
2000-01-02 -0.360286 NaN NaN
2000-01-03 -0.092439 NaN NaN
2000-01-04 0.169439 NaN NaN
2000-01-05 0.185829 0.002341 0.091736
2000-01-06 0.432599 0.067028 0.295949
2000-01-07 -0.374317 0.064222 0.055903
2000-01-08 1.258054 0.334321 -0.132972
A rm rm-5
2000-04-02 0.499860 -0.422931 -0.140111
2000-04-03 -0.868718 -0.458962 -0.182373
2000-04-04 0.081059 -0.443494 -0.040646
2000-04-05 0.500275 -0.093048 NaN
2000-04-06 -0.253915 -0.008288 NaN
2000-04-07 -0.159256 -0.140111 NaN
2000-04-08 -1.080027 -0.182373 NaN
2000-04-09 0.789690 -0.040646 NaN
采纳答案by Hennep
You can change the order of operations. Now you are first shifting and afterwards taking the mean. Due to your first shift you create your NaN's at the end.
您可以更改操作顺序。现在你先移动,然后取平均值。由于您的第一次转变,您最终会创建 NaN。
index = pd.date_range('2000-1-1', periods=100, freq='D')
df = pd.DataFrame(data=np.random.randn(100), index=index, columns=['A'])
df['rm']=pd.rolling_mean(df['A'],5)
df['shift'] = df['A'].shift(-5)
df['rm-5-shift_first']=pd.rolling_mean(df['A'].shift(-5),5)
df['rm-5-mean_first']=pd.rolling_mean(df['A'],5).shift(-5)
print( df.head(n=8))
print( df.tail(n=8))
A rm shift rm-5-shift_first rm-5-mean_first
2000-01-01 -0.120808 NaN 0.830231 NaN 0.184197
2000-01-02 0.029547 NaN 0.047451 NaN 0.187778
2000-01-03 0.002652 NaN 1.040963 NaN 0.395440
2000-01-04 -1.078656 NaN -1.118723 NaN 0.387426
2000-01-05 1.137210 -0.006011 0.469557 0.253896 0.253896
2000-01-06 0.830231 0.184197 -0.390506 0.009748 0.009748
2000-01-07 0.047451 0.187778 -1.624492 -0.324640 -0.324640
2000-01-08 1.040963 0.395440 -1.259306 -0.784694 -0.784694
A rm shift rm-5-shift_first rm-5-mean_first
2000-04-02 -1.283123 -0.270381 0.226257 0.760370 0.760370
2000-04-03 1.369342 0.288072 2.367048 0.959912 0.959912
2000-04-04 0.003363 0.299997 1.143513 1.187941 1.187941
2000-04-05 0.694026 0.400442 NaN NaN NaN
2000-04-06 1.508863 0.458494 NaN NaN NaN
2000-04-07 0.226257 0.760370 NaN NaN NaN
2000-04-08 2.367048 0.959912 NaN NaN NaN
2000-04-09 1.143513 1.187941 NaN NaN NaN
For more see:
更多请见:
http://pandas.pydata.org/pandas-docs/stable/computation.html#moving-rolling-statistics-moments
http://pandas.pydata.org/pandas-docs/stable/computation.html#moving-rolling-statistics-moments
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.shift.html
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.shift.html

