在 Pandas 数据框上滑动窗口
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36937869/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sliding Window over Pandas Dataframe
提问by cs_stackX
I have a large pandas dataframe of time-series data.
我有一个包含时间序列数据的大型 Pandas 数据框。
I currently manipulate this dataframe to create a new, smaller dataframe that is rolling average of every 10 rows. i.e. a rolling window technique. Like this:
我目前操作这个数据框来创建一个新的、更小的数据框,它是每 10 行的滚动平均值。即滚动窗口技术。像这样:
def create_new_df(df):
features = []
x = df['X'].astype(float)
i = x.index.values
time_sequence = [i] * 10
idx = np.array(time_sequence).T.flatten()[:len(x)]
x = x.groupby(idx).mean()
x.name = 'X'
features.append(x)
new_df = pd.concat(features, axis=1)
return new_df
Code to test:
测试代码:
columns = ['X']
df_ = pd.DataFrame(columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
data = np.array([np.arange(20)]*1).T
df = pd.DataFrame(data, columns=columns)
test = create_new_df(df)
print test
Output:
输出:
X
0 4.5
1 14.5
However, I want the function to make the new dataframe using a sliding window with a 50% overlap
但是,我希望该函数使用具有 50% 重叠的滑动窗口制作新数据框
So the output would look like this:
所以输出看起来像这样:
X
0 4.5
1 9.5
2 14.5
How can I do this?
我怎样才能做到这一点?
Here's what I've tried:
这是我尝试过的:
from itertools import tee, izip
def window(iterable, size):
iters = tee(iterable, size)
for i in xrange(1, size):
for each in iters[i:]:
next(each, None)
return izip(*iters)
for each in window(df, 20):
print list(each) # doesn't have the desired sliding window effect
Some might also suggest using the pandas rolling_mean()methods, but if so, I can't see how to use this function with window overlap.
有些人可能还建议使用 pandas 的rolling_mean()方法,但如果是这样,我看不出如何在窗口重叠中使用此函数。
Any help would be much appreciated.
任何帮助将非常感激。
回答by JohnE
I think pandas rolling techniques are fine here. Note that starting with version 0.18.0 of pandas, you would use rolling().mean()
instead of rolling_mean()
.
我认为Pandas滚动技术在这里很好。需要注意的是启动与Pandas的版本0.18.0,您可以使用rolling().mean()
替代rolling_mean()
。
>>> df=pd.DataFrame({ 'x':range(30) })
>>> df = df.rolling(10).mean() # version 0.18.0 syntax
>>> df[4::5] # take every 5th row
x
4 NaN
9 4.5
14 9.5
19 14.5
24 19.5
29 24.5