如何在前瞻性的基础上使用 Pandas 滚动_* 函数

Question

提问by user2543645

Suppose I have a time series:

假设我有一个时间序列：

In[138] rng = pd.date_range('1/10/2011', periods=10, freq='D')
In[139] ts = pd.Series(randn(len(rng)), index=rng)
In[140]
Out[140]:
2011-01-10    0
2011-01-11    1
2011-01-12    2
2011-01-13    3
2011-01-14    4
2011-01-15    5
2011-01-16    6
2011-01-17    7
2011-01-18    8
2011-01-19    9
Freq: D, dtype: int64

If I use one of the rolling_* functions, for instance rolling_sum, I can get the behavior I want for backward looking rolling calculations:

如果我使用rolling_*函数之一，例如rolling_sum，我可以获得我想要的向后滚动计算的行为：

In [157]: pd.rolling_sum(ts, window=3, min_periods=0)
Out[157]: 
2011-01-10     0
2011-01-11     1
2011-01-12     3
2011-01-13     6
2011-01-14     9
2011-01-15    12
2011-01-16    15
2011-01-17    18
2011-01-18    21
2011-01-19    24
Freq: D, dtype: float64

But what if I want to do a forward-looking sum? I've tried something like this:

但是如果我想做一个前瞻性的总结呢？我试过这样的事情：

In [161]: pd.rolling_sum(ts.shift(-2, freq='D'), window=3, min_periods=0)
Out[161]: 
2011-01-08     0
2011-01-09     1
2011-01-10     3
2011-01-11     6
2011-01-12     9
2011-01-13    12
2011-01-14    15
2011-01-15    18
2011-01-16    21
2011-01-17    24
Freq: D, dtype: float64

But that's not exactly the behavior I want. What I am looking for as an output is:

但这不完全是我想要的行为。我正在寻找的输出是：

2011-01-10    3
2011-01-11    6
2011-01-12    9
2011-01-13    12
2011-01-14    15
2011-01-15    18
2011-01-16    21
2011-01-17    24
2011-01-18    17
2011-01-19    9

ie - I want the sum of the "current" day plus the next two days. My current solution is not sufficient because I care about what happens at the edges. I know I could solve this manually by setting up two additional columns that are shifted by 1 and 2 days respectively and then summing the three columns, but there's got to be a more elegant solution.

即 - 我想要“当前”天加上接下来两天的总和。我目前的解决方案还不够，因为我关心边缘会发生什么。我知道我可以通过设置两个分别移动 1 天和 2 天的额外列然后对三列求和来手动解决这个问题，但是必须有一个更优雅的解决方案。

Answer 1

回答by Andy Hayden

Why not just do it on the reversed Series (and reverse the answer):

为什么不直接在反向系列上做（并反转答案）：

In [11]: pd.rolling_sum(ts[::-1], window=3, min_periods=0)[::-1]
Out[11]:
2011-01-10     3
2011-01-11     6
2011-01-12     9
2011-01-13    12
2011-01-14    15
2011-01-15    18
2011-01-16    21
2011-01-17    24
2011-01-18    17
2011-01-19     9
Freq: D, dtype: float64

Answer 2

回答by Tom

Maybe you can try bottleneckmodule. When tsis large, bottleneckis much faster than pandas

也许你可以试试bottleneck模块。当ts很大时，bottleneck比pandas

import bottleneck as bn
result = bn.move_sum(ts[::-1], window=3, min_count=1)[::-1]

And bottleneckhas other rolling functions, such as move_max, move_argmin, move_rank.

并bottleneck具有其他滚动功能，如 move_max、move_argmin、move_rank。

Answer 3

回答by MitchellRosenthal256

I struggled with this then found an easy way using shift.

我为此苦苦挣扎，然后找到了一种使用 shift 的简单方法。

If you want a rolling sum for the next 10 periods, try:

如果您想要接下来 10 个期间的滚动总和，请尝试：

df['NewCol'] = df['OtherCol'].shift(-10).rolling(10, min_periods = 0).sum()

We use shift so that "OtherCol" shows up 10 rows ahead of where it normally would be, then we do a rolling sum over the previous 10 rows. Because we shifted, the previous 10 rows are actually the future 10 rows of the unshifted column. :)

我们使用 shift 以便“OtherCol”在正常位置之前显示 10 行，然后我们对前 10 行进行滚动求和。因为我们移位了，前 10 行实际上是未移位列的未来 10 行。:)

如何在前瞻性的基础上使用 Pandas 滚动_* 函数

提问by user2543645

回答by Andy Hayden

回答by Tom

回答by MitchellRosenthal256

相关推荐

最近更新

标签

如何在前瞻性的基础上使用 Pandas 滚动_* 函数

提问by user2543645

回答by Andy Hayden

回答by Tom

回答by MitchellRosenthal256

相关推荐

在 pandas.Series 中将时间戳转换为 datetime.datetime

Pandas，使用 for 循环构建新的数据框

将 Pandas 数据集转换为数组以在 Scikit-Learn 中建模

pandas 从 Oracle 调用 Python

相关推荐

最近更新

标签