pandas 计算与熊猫的滚动相关性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/27069003/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Calculate rolling correlation with pandas
提问by Wade Bratz
I have a list of 10 stocks differentiated by PERMNO. I would like to group those stocks by PERMNO and calculate the rolling correlation between the stock return (RET) for each PERMNO with the market return (vwretd). The code I am trying is below.
我有一份由 PERMNO 区分的 10 只股票的清单。我想按 PERMNO 对这些股票进行分组,并计算每个 PERMNO 的股票收益 (RET) 与市场收益 (vwretd) 之间的滚动相关性。我正在尝试的代码如下。
CRSP['rollingcorr'] = CRSP.groupby('PERMNO').rolling_corr(CRSP['RET'],CRSP['vwretd'],10)
The error I am getting is below.
我得到的错误如下。
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-32-c18e1ce01302> in <module>()
      1 #CRSP['rollingcorr'] = CRSP.rolling_corr(CRSP['vwretd'],CRSP['RET'],120)
----> 2 CRSP['rollingmean'] = CRSP.groupby('PERMNO').rolling_corr(CRSP['vwretd'],10)
      3 CRSP.head(20)
C:\Users\rebortz\Anaconda\lib\site-packages\pandas\core\groupby.pyc in __getattr__(self, attr)
    296 
    297         raise AttributeError("%r object has no attribute %r" %
--> 298                              (type(self).__name__, attr))
    299 
    300     def __getitem__(self, key):
AttributeError: 'DataFrameGroupBy' object has no attribute 'rolling_corr'
please help!
请帮忙!
Thanks
谢谢
回答by Parikshit Bhinde
Running rolling.corr()on Python 3.5 generates a warning the function is deprecated and may stop working in future. Using Series.rolling(window=<period>).corr(other=series)instead is recommended.
E.g.
rolling.corr()在 Python 3.5 上运行会生成警告,该函数已弃用,将来可能会停止工作。使用Series.rolling(window=<period>).corr(other=series)建议来代替。例如
data['scrip1DailyReturn'].rolling(window=90).corr(other=data['scrip2DailyReturn'])
回答by Jerome Montino
Use pandas.rolling_corr, not DataFrame.rolling_corr. Besides, groupbyreturns a generator. See below code.
使用pandas.rolling_corr,不是DataFrame.rolling_corr。此外,groupby返回一个生成器。见下面的代码。
Code:
代码:
import pandas as pd
df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")
for key, value in df_gen:
    print "key: {}".format(key)
    print value.rolling_corr(value["Value1"],value["Value2"], 3)
Output:
输出:
key: Blue
1          NaN
3          NaN
6     0.931673
8     0.865066
10    0.089304
12   -0.998656
15   -0.971373
17   -0.667316
dtype: float64
key: Red
0          NaN
2          NaN
5    -0.911357
9    -0.152221
11   -0.971153
14    0.438697
18   -0.550727
dtype: float64
key: Yellow
4          NaN
7          NaN
13   -0.040330
16    0.879371
dtype: float64
You can change the loop part to the following to view the original dataframe post-grouping with a new column as well.
您可以将循环部分更改为以下内容,以查看原始数据框后分组以及新列。
for key, value in df_gen:
    value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
    print value
Output:
输出:
   Color    Value1    Value2  ROLL_CORR
1   Blue  0.951227  0.514999        NaN
3   Blue  0.649112  0.513052        NaN
6   Blue  0.148165  0.342205   0.931673
8   Blue  0.626883  0.421530   0.865066
10  Blue  0.286738  0.583811   0.089304
12  Blue  0.966779  0.227340  -0.998656
15  Blue  0.065493  0.887640  -0.971373
17  Blue  0.757932  0.900103  -0.667316
key: Red
   Color    Value1    Value2  ROLL_CORR
0    Red  0.201435  0.981871        NaN
2    Red  0.522955  0.357239        NaN
5    Red  0.806326  0.310039  -0.911357
9    Red  0.656126  0.678047  -0.152221
11   Red  0.435898  0.908388  -0.971153
14   Red  0.116419  0.555821   0.438697
18   Red  0.793102  0.168033  -0.550727
key: Yellow
     Color    Value1    Value2  ROLL_CORR
4   Yellow  0.099474  0.143293        NaN
7   Yellow  0.073128  0.749297        NaN
13  Yellow  0.006777  0.318383  -0.040330
16  Yellow  0.345647  0.993382   0.879371
If you want to join them all together after processing (this might be confusing to others, by the way), just use concatafter processing groups.
如果您想在处理后将它们全部连接在一起(顺便说一下,这可能会让其他人感到困惑),只需使用concat后处理组。
import pandas as pd
df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")
dfs = [] # Container for dataframes.
for key, value in df_gen:
    value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
    print value
    dfs.append(value)
df_final = pd.concat(dfs)
print df_final
Output:
输出:
     Color    Value1    Value2  ROLL_CORR
1     Blue  0.951227  0.514999        NaN
3     Blue  0.649112  0.513052        NaN
6     Blue  0.148165  0.342205   0.931673
8     Blue  0.626883  0.421530   0.865066
10    Blue  0.286738  0.583811   0.089304
12    Blue  0.966779  0.227340  -0.998656
15    Blue  0.065493  0.887640  -0.971373
17    Blue  0.757932  0.900103  -0.667316
0      Red  0.201435  0.981871        NaN
2      Red  0.522955  0.357239        NaN
5      Red  0.806326  0.310039  -0.911357
9      Red  0.656126  0.678047  -0.152221
11     Red  0.435898  0.908388  -0.971153
14     Red  0.116419  0.555821   0.438697
18     Red  0.793102  0.168033  -0.550727
4   Yellow  0.099474  0.143293        NaN
7   Yellow  0.073128  0.749297        NaN
13  Yellow  0.006777  0.318383  -0.040330
16  Yellow  0.345647  0.993382   0.879371
Hope this helps.
希望这可以帮助。
回答by Wade Bratz
I found an efficient solution. Fairly simple.
我找到了一个有效的解决方案。相当简单。
def roll_corr_groupby(x,i):
    x['Z'] = rolling_corr(x['col 1'], x['col 2'],i) 
    return x
x.groupby(['key']).apply(roll_corr_groupby)
x.head()

