pandas 使用在熊猫中滚动的滑动窗口迭代器

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38509107/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:38:16  来源:igfitidea点击:

Sliding window iterator using rolling in pandas

pythonpandasnumpydataframepandas-groupby

提问by Dzung Nguyen

If it's single row, I can get the iterator as following

如果是单行,我可以得到如下迭代器

import pandas as pd
import numpy as np

a = np.zeros((100,40))
X = pd.DataFrame(a)

for index, row in X.iterrows():
    print index
    print row

Now I want each iterator will return a subset X[0:9, :], X[5:14, :], X[10:19, :]etc. How do I achieve this with rolling (pandas.DataFrame.rolling)?

现在,我希望每个迭代器将返回一个子集X[0:9, :]X[5:14, :]X[10:19, :]等我怎样滚动做到这一点(pandas.DataFrame.rolling)?

采纳答案by piRSquared

I'll experiment with the following dataframe.

我将试验以下数据框。

Setup

设置

import pandas as pd
import numpy as np
from string import uppercase

def generic_portfolio_df(start, end, freq, num_port, num_sec, seed=314):
    np.random.seed(seed)
    portfolios = pd.Index(['Portfolio {}'.format(i) for i in uppercase[:num_port]],
                          name='Portfolio')
    securities = ['s{:02d}'.format(i) for i in range(num_sec)]
    dates = pd.date_range(start, end, freq=freq)
    return pd.DataFrame(np.random.rand(len(dates) * num_sec, num_port),
                        index=pd.MultiIndex.from_product([dates, securities],
                                                         names=['Date', 'Id']),
                        columns=portfolios
                       ).groupby(level=0).apply(lambda x: x / x.sum())    


df = generic_portfolio_df('2014-12-31', '2015-05-30', 'BM', 3, 5)

df.head(10)

enter image description here

enter image description here

I'll now introduce a function to roll a number of rows and concatenate into a single dataframe where I'll add a top level to the column index that indicates the location in the roll.

我现在将介绍一个函数来滚动多行并将其连接到单个数据帧中,我将在其中向列索引添加一个顶层,以指示滚动中的位置。

Solution Step-1

解决方案第 1 步

def rolled(df, n):
    k = range(df.columns.nlevels)
    _k = [i - len(k) for i in k]
    myroll = pd.concat([df.shift(i).stack(level=k) for i in range(n)],
                       axis=1, keys=range(n)).unstack(level=_k)
    return [(i, row.unstack(0)) for i, row in myroll.iterrows()]

Though its hidden in the function, myrollwould look like this

虽然它隐藏在函数中,myroll但看起来像这样

enter image description here

enter image description here

Now we can use it just like an iterator.

现在我们可以像迭代器一样使用它。

Solution Step-2

解决方案第 2 步

for i, roll in rolled(df.head(5), 3):
    print roll
    print

                    0   1   2
Portfolio                    
Portfolio A  0.326164 NaN NaN
Portfolio B  0.201597 NaN NaN
Portfolio C  0.085340 NaN NaN

                    0         1   2
Portfolio                          
Portfolio A  0.278614  0.326164 NaN
Portfolio B  0.314448  0.201597 NaN
Portfolio C  0.266392  0.085340 NaN

                    0         1         2
Portfolio                                
Portfolio A  0.258958  0.278614  0.326164
Portfolio B  0.089224  0.314448  0.201597
Portfolio C  0.293570  0.266392  0.085340

                    0         1         2
Portfolio                                
Portfolio A  0.092760  0.258958  0.278614
Portfolio B  0.262511  0.089224  0.314448
Portfolio C  0.084208  0.293570  0.266392

                    0         1         2
Portfolio                                
Portfolio A  0.043503  0.092760  0.258958
Portfolio B  0.132221  0.262511  0.089224
Portfolio C  0.270490  0.084208  0.293570

回答by Alex

That's not how rolling works. It "provides rolling transformations" (from the docs).

这不是滚动的工作方式。它“提供滚动转换”(来自docs)。

You can loop and use pandas indexing?

您可以循环使用Pandas索引吗?

for i in range((X.shape[0] + 9) // 10):
    X_subset = X.iloc[i * 10: (i + 1) * 10])