pandas 使用在熊猫中滚动的滑动窗口迭代器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38509107/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sliding window iterator using rolling in pandas
提问by Dzung Nguyen
If it's single row, I can get the iterator as following
如果是单行,我可以得到如下迭代器
import pandas as pd
import numpy as np
a = np.zeros((100,40))
X = pd.DataFrame(a)
for index, row in X.iterrows():
print index
print row
Now I want each iterator will return a subset X[0:9, :]
, X[5:14, :]
, X[10:19, :]
etc. How do I achieve this with rolling (pandas.DataFrame.rolling
)?
现在,我希望每个迭代器将返回一个子集X[0:9, :]
,X[5:14, :]
,X[10:19, :]
等我怎样滚动做到这一点(pandas.DataFrame.rolling
)?
采纳答案by piRSquared
I'll experiment with the following dataframe.
我将试验以下数据框。
Setup
设置
import pandas as pd
import numpy as np
from string import uppercase
def generic_portfolio_df(start, end, freq, num_port, num_sec, seed=314):
np.random.seed(seed)
portfolios = pd.Index(['Portfolio {}'.format(i) for i in uppercase[:num_port]],
name='Portfolio')
securities = ['s{:02d}'.format(i) for i in range(num_sec)]
dates = pd.date_range(start, end, freq=freq)
return pd.DataFrame(np.random.rand(len(dates) * num_sec, num_port),
index=pd.MultiIndex.from_product([dates, securities],
names=['Date', 'Id']),
columns=portfolios
).groupby(level=0).apply(lambda x: x / x.sum())
df = generic_portfolio_df('2014-12-31', '2015-05-30', 'BM', 3, 5)
df.head(10)
I'll now introduce a function to roll a number of rows and concatenate into a single dataframe where I'll add a top level to the column index that indicates the location in the roll.
我现在将介绍一个函数来滚动多行并将其连接到单个数据帧中,我将在其中向列索引添加一个顶层,以指示滚动中的位置。
Solution Step-1
解决方案第 1 步
def rolled(df, n):
k = range(df.columns.nlevels)
_k = [i - len(k) for i in k]
myroll = pd.concat([df.shift(i).stack(level=k) for i in range(n)],
axis=1, keys=range(n)).unstack(level=_k)
return [(i, row.unstack(0)) for i, row in myroll.iterrows()]
Though its hidden in the function, myroll
would look like this
虽然它隐藏在函数中,myroll
但看起来像这样
Now we can use it just like an iterator.
现在我们可以像迭代器一样使用它。
Solution Step-2
解决方案第 2 步
for i, roll in rolled(df.head(5), 3):
print roll
print
0 1 2
Portfolio
Portfolio A 0.326164 NaN NaN
Portfolio B 0.201597 NaN NaN
Portfolio C 0.085340 NaN NaN
0 1 2
Portfolio
Portfolio A 0.278614 0.326164 NaN
Portfolio B 0.314448 0.201597 NaN
Portfolio C 0.266392 0.085340 NaN
0 1 2
Portfolio
Portfolio A 0.258958 0.278614 0.326164
Portfolio B 0.089224 0.314448 0.201597
Portfolio C 0.293570 0.266392 0.085340
0 1 2
Portfolio
Portfolio A 0.092760 0.258958 0.278614
Portfolio B 0.262511 0.089224 0.314448
Portfolio C 0.084208 0.293570 0.266392
0 1 2
Portfolio
Portfolio A 0.043503 0.092760 0.258958
Portfolio B 0.132221 0.262511 0.089224
Portfolio C 0.270490 0.084208 0.293570
回答by Alex
That's not how rolling works. It "provides rolling transformations" (from the docs).
这不是滚动的工作方式。它“提供滚动转换”(来自docs)。
You can loop and use pandas indexing?
您可以循环使用Pandas索引吗?
for i in range((X.shape[0] + 9) // 10):
X_subset = X.iloc[i * 10: (i + 1) * 10])