pandas 计算与熊猫的滚动相关性

Question

提问by Wade Bratz

I have a list of 10 stocks differentiated by PERMNO. I would like to group those stocks by PERMNO and calculate the rolling correlation between the stock return (RET) for each PERMNO with the market return (vwretd). The code I am trying is below.

我有一份由 PERMNO 区分的 10 只股票的清单。我想按 PERMNO 对这些股票进行分组，并计算每个 PERMNO 的股票收益 (RET) 与市场收益 (vwretd) 之间的滚动相关性。我正在尝试的代码如下。

CRSP['rollingcorr'] = CRSP.groupby('PERMNO').rolling_corr(CRSP['RET'],CRSP['vwretd'],10)

The error I am getting is below.

我得到的错误如下。

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-32-c18e1ce01302> in <module>()
      1 #CRSP['rollingcorr'] = CRSP.rolling_corr(CRSP['vwretd'],CRSP['RET'],120)
----> 2 CRSP['rollingmean'] = CRSP.groupby('PERMNO').rolling_corr(CRSP['vwretd'],10)
      3 CRSP.head(20)

C:\Users\rebortz\Anaconda\lib\site-packages\pandas\core\groupby.pyc in __getattr__(self, attr)
    296 
    297         raise AttributeError("%r object has no attribute %r" %
--> 298                              (type(self).__name__, attr))
    299 
    300     def __getitem__(self, key):

AttributeError: 'DataFrameGroupBy' object has no attribute 'rolling_corr'

please help!

请帮忙！

Thanks

谢谢

Answer 1

回答by Parikshit Bhinde

Running rolling.corr()on Python 3.5 generates a warning the function is deprecated and may stop working in future. Using Series.rolling(window=<period>).corr(other=series)instead is recommended. E.g.

rolling.corr()在 Python 3.5 上运行会生成警告，该函数已弃用，将来可能会停止工作。使用Series.rolling(window=<period>).corr(other=series)建议来代替。例如

data['scrip1DailyReturn'].rolling(window=90).corr(other=data['scrip2DailyReturn'])

Answer 2

回答by Jerome Montino

Use pandas.rolling_corr, not DataFrame.rolling_corr. Besides, groupbyreturns a generator. See below code.

使用pandas.rolling_corr，不是DataFrame.rolling_corr。此外，groupby返回一个生成器。见下面的代码。

Code:

代码：

import pandas as pd

df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")

for key, value in df_gen:
    print "key: {}".format(key)
    print value.rolling_corr(value["Value1"],value["Value2"], 3)

Output:

输出：

key: Blue
1          NaN
3          NaN
6     0.931673
8     0.865066
10    0.089304
12   -0.998656
15   -0.971373
17   -0.667316
dtype: float64
key: Red
0          NaN
2          NaN
5    -0.911357
9    -0.152221
11   -0.971153
14    0.438697
18   -0.550727
dtype: float64
key: Yellow
4          NaN
7          NaN
13   -0.040330
16    0.879371
dtype: float64

You can change the loop part to the following to view the original dataframe post-grouping with a new column as well.

您可以将循环部分更改为以下内容，以查看原始数据框后分组以及新列。

for key, value in df_gen:
    value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
    print value

Output:

输出：

   Color    Value1    Value2  ROLL_CORR
1   Blue  0.951227  0.514999        NaN
3   Blue  0.649112  0.513052        NaN
6   Blue  0.148165  0.342205   0.931673
8   Blue  0.626883  0.421530   0.865066
10  Blue  0.286738  0.583811   0.089304
12  Blue  0.966779  0.227340  -0.998656
15  Blue  0.065493  0.887640  -0.971373
17  Blue  0.757932  0.900103  -0.667316
key: Red
   Color    Value1    Value2  ROLL_CORR
0    Red  0.201435  0.981871        NaN
2    Red  0.522955  0.357239        NaN
5    Red  0.806326  0.310039  -0.911357
9    Red  0.656126  0.678047  -0.152221
11   Red  0.435898  0.908388  -0.971153
14   Red  0.116419  0.555821   0.438697
18   Red  0.793102  0.168033  -0.550727
key: Yellow
     Color    Value1    Value2  ROLL_CORR
4   Yellow  0.099474  0.143293        NaN
7   Yellow  0.073128  0.749297        NaN
13  Yellow  0.006777  0.318383  -0.040330
16  Yellow  0.345647  0.993382   0.879371

If you want to join them all together after processing (this might be confusing to others, by the way), just use concatafter processing groups.

如果您想在处理后将它们全部连接在一起（顺便说一下，这可能会让其他人感到困惑），只需使用concat后处理组。

import pandas as pd

df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")

dfs = [] # Container for dataframes.

for key, value in df_gen:
    value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
    print value
    dfs.append(value)

df_final = pd.concat(dfs)
print df_final

Output:

输出：

     Color    Value1    Value2  ROLL_CORR
1     Blue  0.951227  0.514999        NaN
3     Blue  0.649112  0.513052        NaN
6     Blue  0.148165  0.342205   0.931673
8     Blue  0.626883  0.421530   0.865066
10    Blue  0.286738  0.583811   0.089304
12    Blue  0.966779  0.227340  -0.998656
15    Blue  0.065493  0.887640  -0.971373
17    Blue  0.757932  0.900103  -0.667316
0      Red  0.201435  0.981871        NaN
2      Red  0.522955  0.357239        NaN
5      Red  0.806326  0.310039  -0.911357
9      Red  0.656126  0.678047  -0.152221
11     Red  0.435898  0.908388  -0.971153
14     Red  0.116419  0.555821   0.438697
18     Red  0.793102  0.168033  -0.550727
4   Yellow  0.099474  0.143293        NaN
7   Yellow  0.073128  0.749297        NaN
13  Yellow  0.006777  0.318383  -0.040330
16  Yellow  0.345647  0.993382   0.879371

Hope this helps.

希望这可以帮助。

Answer 3

回答by Wade Bratz

I found an efficient solution. Fairly simple.

我找到了一个有效的解决方案。相当简单。

def roll_corr_groupby(x,i):
    x['Z'] = rolling_corr(x['col 1'], x['col 2'],i) 
    return x

x.groupby(['key']).apply(roll_corr_groupby)
x.head()

pandas 计算与熊猫的滚动相关性

提问by Wade Bratz

回答by Parikshit Bhinde

回答by Jerome Montino

回答by Wade Bratz

相关推荐

最近更新

标签

pandas 计算与熊猫的滚动相关性

提问by Wade Bratz

回答by Parikshit Bhinde

回答by Jerome Montino

回答by Wade Bratz

相关推荐

Pandas/matplotlib 条形图，颜色由列定义

将空值添加到 Pandas 数据框

pandas 如何在熊猫数据框中使用列表作为值？

来自csv的第一行和最后一行的Python pandas DataFrame

相关推荐

最近更新

标签