在 Pandas 的多索引数据帧上使用滚动函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19188217/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
using rolling functions on multi-index dataframe in pandas
提问by silencer
I have a multi-index dataframe in pandas, where index is on ID and timestamp. I want to be able to compute a time-series rolling sum of each ID but I can't seem to figure out how to do it without loops.
我在 Pandas 中有一个多索引数据框,其中索引位于 ID 和时间戳上。我希望能够计算每个 ID 的时间序列滚动总和,但我似乎无法弄清楚如何在没有循环的情况下进行。
content = io.BytesIO("""\
IDs timestamp value
0 2010-10-30 1
0 2010-11-30 2
0 2011-11-30 3
1 2000-01-01 300
1 2007-01-01 33
1 2010-01-01 400
2 2000-01-01 11""")
df = pd.read_table(content, header=0, sep='\s+', parse_dates=[1])
df.set_index(['IDs', 'timestamp'], inplace=True)
pd.stats.moments.rolling_sum(df,window=2
And the output for this is:
这个输出是:
value
IDs timestamp
0 2010-10-30 NaN
2010-11-30 3
2011-11-30 5
1 2000-01-01 303
2007-01-01 333
2010-01-01 433
2 2000-01-01 411
Notice the overlap between IDs 0 and 1 and 1 and 2 at the edges (I don't want that, messes up my calculations). One possible way to get around this is to use groupby on IDs and then loop through that groupby and then apply a rolling_sum.
注意边缘处 ID 0 和 1 以及 1 和 2 之间的重叠(我不想要那样,弄乱了我的计算)。解决此问题的一种可能方法是在 ID 上使用 groupby,然后遍历该 groupby,然后应用滚动总和。
I am sure there is a function to help me do this without using loops.
我确信有一个函数可以帮助我在不使用循环的情况下做到这一点。
回答by Jeff
Group first, then roll the sum (also rolling_sumis available in the top-level namespace)
先分组,然后滚动总和(也rolling_sum可在顶级命名空间中使用)
In [18]: df.groupby(level='IDs').apply(lambda x: pd.rolling_sum(x,2))
Out[18]:
value
IDs timestamp
0 2010-10-30 NaN
2010-11-30 3
2011-11-30 5
1 2000-01-01 NaN
2007-01-01 333
2010-01-01 433
2 2000-01-01 NaN

