在 Pandas 数据框上滑动窗口

Question

提问by cs_stackX

I have a large pandas dataframe of time-series data.

我有一个包含时间序列数据的大型 Pandas 数据框。

I currently manipulate this dataframe to create a new, smaller dataframe that is rolling average of every 10 rows. i.e. a rolling window technique. Like this:

我目前操作这个数据框来创建一个新的、更小的数据框，它是每 10 行的滚动平均值。即滚动窗口技术。像这样：

def create_new_df(df):
    features = []
    x = df['X'].astype(float)
    i = x.index.values
    time_sequence = [i] * 10
    idx = np.array(time_sequence).T.flatten()[:len(x)]
    x = x.groupby(idx).mean()
    x.name = 'X'
    features.append(x)
    new_df = pd.concat(features, axis=1)
    return new_df

Code to test:

测试代码：

columns = ['X']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
data = np.array([np.arange(20)]*1).T
df = pd.DataFrame(data, columns=columns)

test = create_new_df(df)
print test

Output:

输出：

      X
0   4.5
1  14.5

However, I want the function to make the new dataframe using a sliding window with a 50% overlap

但是，我希望该函数使用具有 50% 重叠的滑动窗口制作新数据框

So the output would look like this:

所以输出看起来像这样：

How can I do this?

我怎样才能做到这一点？

Here's what I've tried:

这是我尝试过的：

from itertools import tee, izip

def window(iterable, size):
    iters = tee(iterable, size)
    for i in xrange(1, size):
        for each in iters[i:]:
            next(each, None)
    return izip(*iters)

for each in window(df, 20):
    print list(each) # doesn't have the desired sliding window effect

Some might also suggest using the pandas rolling_mean()methods, but if so, I can't see how to use this function with window overlap.

有些人可能还建议使用 pandas 的rolling_mean()方法，但如果是这样，我看不出如何在窗口重叠中使用此函数。

Any help would be much appreciated.

任何帮助将非常感激。

Answer 1

回答by JohnE

I think pandas rolling techniques are fine here. Note that starting with version 0.18.0 of pandas, you would use rolling().mean()instead of rolling_mean().

我认为Pandas滚动技术在这里很好。需要注意的是启动与Pandas的版本0.18.0，您可以使用rolling().mean()替代rolling_mean()。

>>> df=pd.DataFrame({ 'x':range(30) })
>>> df = df.rolling(10).mean()           # version 0.18.0 syntax
>>> df[4::5]                             # take every 5th row

       x
4    NaN
9    4.5
14   9.5
19  14.5
24  19.5
29  24.5

在 Pandas 数据框上滑动窗口

提问by cs_stackX

回答by JohnE

相关推荐

最近更新

标签

在 Pandas 数据框上滑动窗口

提问by cs_stackX

回答by JohnE

相关推荐

使用 Pandas Value_Counts 和 matplotlib

如何创建超过 2 个维度的 Pandas 数据框？

Pandas DateOffset，倒退一天

pandas python datetime快速提取小时分钟

相关推荐

最近更新

标签