Python pandas 使用滚动应用到 groupby 对象以矢量化方式计算机车车辆 beta
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34802972/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas calculate rolling stock beta using rolling apply to groupby object in vectorized fashion
提问by Carl
I have a large data frame, df, containing 4 columns:
我有一个大数据框 df,包含 4 列:
id period ret_1m mkt_ret_1m
131146 CAN00WG0 199609 -0.1538 0.047104
133530 CAN00WG0 199610 -0.0455 -0.014143
135913 CAN00WG0 199611 0.0000 0.040926
138334 CAN00WG0 199612 0.2952 0.008723
140794 CAN00WG0 199701 -0.0257 0.039916
143274 CAN00WG0 199702 -0.0038 -0.025442
145754 CAN00WG0 199703 -0.2992 -0.049279
148246 CAN00WG0 199704 -0.0919 -0.005948
150774 CAN00WG0 199705 0.0595 0.122322
153318 CAN00WG0 199706 -0.0337 0.045765
id period ret_1m mkt_ret_1m
160980 CAN00WH0 199709 0.0757 0.079293
163569 CAN00WH0 199710 -0.0741 -0.044000
166159 CAN00WH0 199711 0.1000 -0.014644
168782 CAN00WH0 199712 -0.0909 -0.007072
171399 CAN00WH0 199801 -0.0100 0.001381
174022 CAN00WH0 199802 0.1919 0.081924
176637 CAN00WH0 199803 0.0085 0.050415
179255 CAN00WH0 199804 -0.0168 0.018393
181880 CAN00WH0 199805 0.0427 -0.051279
184516 CAN00WH0 199806 -0.0656 -0.011516
id period ret_1m mkt_ret_1m
143275 CAN00WO0 199702 -0.1176 -0.025442
145755 CAN00WO0 199703 -0.0074 -0.049279
148247 CAN00WO0 199704 -0.0075 -0.005948
150775 CAN00WO0 199705 0.0451 0.122322
etc.
等等。
I am attempting to calculate a common financial measure, known as beta, using a function, that takes two of the columns, ret_1m, the monthly stock_return, and ret_1m_mkt, the market 1 month return for the same period (period_id). I want to apply a function (calc_beta) to calculate the 12-month result of this function on a 12 month rolling basis.
我正在尝试使用一个函数计算一个常见的财务指标,称为 beta,该函数采用两列,ret_1m,每月 stock_return 和 ret_1m_mkt,即同一时期 (period_id) 的市场 1 个月回报。我想应用一个函数 (calc_beta) 来计算这个函数在 12 个月滚动基础上的 12 个月结果。
To do this, I am creating a groupby object:
为此,我正在创建一个 groupby 对象:
grp = df.groupby('id')
What I would like to do is use something like:
我想做的是使用类似的东西:
period = 12
for stock, sub_df in grp:
arg = sub_df[['ret_1m', 'mkt_ret_1m']]
beta = pd.rolling_apply(arg, period, calc_beta, min_periods = period)
Now, here is the first problem. According to the documentation, pd.rolling_apply arg can be either a series or a data frame. However, it appears that the data frame I supply is converted into a numpy array that can only contain one column of data, rather than the two I have tried to supply. So my code below for calc_beta will not work, because I need to pass both the stock and market returns:
现在,这是第一个问题。根据文档, pd.rolling_apply arg 可以是系列或数据框。但是,我提供的数据框似乎被转换为一个只能包含一列数据的 numpy 数组,而不是我试图提供的两列数据。所以我下面的 calc_beta 代码将不起作用,因为我需要同时传递股票和市场回报:
def calc_beta(np_array)
s = np_array[:,0] # stock returns are column zero from numpy array
m = np_array[:,1] # market returns are column one from numpy array
covariance = np.cov(s,m) # Calculate covariance between stock and market
beta = covariance[0,1]/covariance[1,1]
return beta
So my questions are as follows, I think it makes sense to list them in this way:
所以我的问题如下,我认为这样列出它们是有道理的:
(i) How can I pass a data frame/multiple series/numpy array with more than one column to calc_beta using rolling_apply?
(ii) How can I return more than one value (e.g. the beta) from the calc_beta function?
(iii) Having calculated rolling quantities, how can I recombined with the original dataframe df so that I have the rolling quantities corresponding to the correct date in the period column?
(iv) Is there a better (vectorized) way of achieving this? I have seen some similar questions using e.g. df.apply(pd.rolling_apply,period,??) but I did not understand how these worked.
I gather that rolling_apply previously was unable to handle data frames, but the documentations suggests that it is now able to do so. My pandas.versionis 0.16.1.
我收集到 rolling_apply 以前无法处理数据帧,但文档表明它现在能够这样做。我的Pandas 版本是 0.16.1。
Thanks for any help! I have lost 1.5 days trying to figure this out and am totally stumped.
谢谢你的帮助!我已经失去了 1.5 天试图弄清楚这一点,我完全被难住了。
Ultimately, what I want is something like this:
最终,我想要的是这样的:
id period ret_1m mkt_ret_1m beta other_quantities
131146 CAN00WG0 199609 -0.1538 0.047104 0.521 xxx
133530 CAN00WG0 199610 -0.0455 -0.014143 0.627 xxxx
135913 CAN00WG0 199611 0.0000 0.040926 0.341 xxx
138334 CAN00WG0 199612 0.2952 0.008723 0.567 xx
140794 CAN00WG0 199701 -0.0257 0.039916 0.4612 xxx
143274 CAN00WG0 199702 -0.0038 -0.025442 0.215 xxx
145754 CAN00WG0 199703 -0.2992 -0.049279 0.4678 xxx
148246 CAN00WG0 199704 -0.0919 -0.005948 -0.4225 xxx
150774 CAN00WG0 199705 0.0595 0.122322 0.780 xxx
153318 CAN00WG0 199706 -0.0337 0.045765 0.623 xxx
id period ret_1m mkt_ret_1m beta other_quantities
160980 CAN00WH0 199709 0.0757 0.079293 -0.913 xx
163569 CAN00WH0 199710 -0.0741 -0.044000 0.894 xxx
166159 CAN00WH0 199711 0.1000 -0.014644 0.563 xxx
168782 CAN00WH0 199712 -0.0909 -0.007072 0.734 xxx
171399 CAN00WH0 199801 -0.0100 0.001381 0.894 xxxx
174022 CAN00WH0 199802 0.1919 0.081924 0.789 xx
176637 CAN00WH0 199803 0.0085 0.050415 0.1563 xxxx
179255 CAN00WH0 199804 -0.0168 0.018393 -0.64 xxxx
181880 CAN00WH0 199805 0.0427 -0.051279 -0.742 xxx
184516 CAN00WH0 199806 -0.0656 -0.011516 0.925 xxx
id period ret_1m mkt_ret_1m beta
143275 CAN00WO0 199702 -0.1176 -0.025442 -1.52 xx
145755 CAN00WO0 199703 -0.0074 -0.049279 -0.632 xxx
148247 CAN00WO0 199704 -0.0075 -0.005948 1.521 xx
150775 CAN00WO0 199705 0.0451 0.122322 0.0321 xxx
etc.
等等。
回答by Happy001
I guess pd.rolling_apply
doesn't help in this case since it seems to me that it essentially only takes a Series
(Even if a dataframe is passed, it's processing one column a time). But you can always write your own rolling_apply that takes a dataframe.
我想pd.rolling_apply
在这种情况下没有帮助,因为在我看来它基本上只需要一个Series
(即使传递了一个数据帧,它也一次处理一列)。但是您始终可以编写自己的带有数据帧的 rolling_apply。
import pandas as pd
import numpy as np
from StringIO import StringIO
df = pd.read_csv(StringIO(''' id period ret_1m mkt_ret_1m
131146 CAN00WG0 199609 -0.1538 0.047104
133530 CAN00WG0 199610 -0.0455 -0.014143
135913 CAN00WG0 199611 0.0000 0.040926
138334 CAN00WG0 199612 0.2952 0.008723
140794 CAN00WG0 199701 -0.0257 0.039916
143274 CAN00WG0 199702 -0.0038 -0.025442
145754 CAN00WG0 199703 -0.2992 -0.049279
148246 CAN00WG0 199704 -0.0919 -0.005948
150774 CAN00WG0 199705 0.0595 0.122322
153318 CAN00WG0 199706 -0.0337 0.045765
160980 CAN00WH0 199709 0.0757 0.079293
163569 CAN00WH0 199710 -0.0741 -0.044000
166159 CAN00WH0 199711 0.1000 -0.014644
168782 CAN00WH0 199712 -0.0909 -0.007072
171399 CAN00WH0 199801 -0.0100 0.001381
174022 CAN00WH0 199802 0.1919 0.081924
176637 CAN00WH0 199803 0.0085 0.050415
179255 CAN00WH0 199804 -0.0168 0.018393
181880 CAN00WH0 199805 0.0427 -0.051279
184516 CAN00WH0 199806 -0.0656 -0.011516
143275 CAN00WO0 199702 -0.1176 -0.025442
145755 CAN00WO0 199703 -0.0074 -0.049279
148247 CAN00WO0 199704 -0.0075 -0.005948
150775 CAN00WO0 199705 0.0451 0.122322'''), sep='\s+')
def calc_beta(df):
np_array = df.values
s = np_array[:,0] # stock returns are column zero from numpy array
m = np_array[:,1] # market returns are column one from numpy array
covariance = np.cov(s,m) # Calculate covariance between stock and market
beta = covariance[0,1]/covariance[1,1]
return beta
def rolling_apply(df, period, func, min_periods=None):
if min_periods is None:
min_periods = period
result = pd.Series(np.nan, index=df.index)
for i in range(1, len(df)+1):
sub_df = df.iloc[max(i-period, 0):i,:] #I edited here
if len(sub_df) >= min_periods:
idx = sub_df.index[-1]
result[idx] = func(sub_df)
return result
df['beta'] = np.nan
grp = df.groupby('id')
period = 6 #I'm using 6 to see some not NaN values, since sample data don't have longer than 12 groups
for stock, sub_df in grp:
beta = rolling_apply(sub_df[['ret_1m','mkt_ret_1m']], period, calc_beta, min_periods = period)
beta.name = 'beta'
df.update(beta)
print df
Output
输出
id period ret_1m mkt_ret_1m beta
131146 CAN00WG0 199609 -0.1538 0.047104 NaN
133530 CAN00WG0 199610 -0.0455 -0.014143 NaN
135913 CAN00WG0 199611 0.0000 0.040926 NaN
138334 CAN00WG0 199612 0.2952 0.008723 NaN
140794 CAN00WG0 199701 -0.0257 0.039916 NaN
143274 CAN00WG0 199702 -0.0038 -0.025442 -1.245908
145754 CAN00WG0 199703 -0.2992 -0.049279 2.574464
148246 CAN00WG0 199704 -0.0919 -0.005948 2.657887
150774 CAN00WG0 199705 0.0595 0.122322 1.371090
153318 CAN00WG0 199706 -0.0337 0.045765 1.494095
... ... ... ... ... ...
171399 CAN00WH0 199801 -0.0100 0.001381 NaN
174022 CAN00WH0 199802 0.1919 0.081924 1.542782
176637 CAN00WH0 199803 0.0085 0.050415 1.605407
179255 CAN00WH0 199804 -0.0168 0.018393 1.571015
181880 CAN00WH0 199805 0.0427 -0.051279 1.139972
184516 CAN00WH0 199806 -0.0656 -0.011516 1.101890
143275 CAN00WO0 199702 -0.1176 -0.025442 NaN
145755 CAN00WO0 199703 -0.0074 -0.049279 NaN
148247 CAN00WO0 199704 -0.0075 -0.005948 NaN
150775 CAN00WO0 199705 0.0451 0.122322 NaN
回答by vlmercado
Try pd.rolling_cov() and pd.rolling.var() as follows:
尝试 pd.rolling_cov() 和 pd.rolling.var() 如下:
import pandas as pd
import numpy as np
from StringIO import StringIO
df = pd.read_csv(StringIO(''' id period ret_1m mkt_ret_1m
131146 CAN00WG0 199609 -0.1538 0.047104
133530 CAN00WG0 199610 -0.0455 -0.014143
135913 CAN00WG0 199611 0.0000 0.040926
138334 CAN00WG0 199612 0.2952 0.008723
140794 CAN00WG0 199701 -0.0257 0.039916
143274 CAN00WG0 199702 -0.0038 -0.025442
145754 CAN00WG0 199703 -0.2992 -0.049279
148246 CAN00WG0 199704 -0.0919 -0.005948
150774 CAN00WG0 199705 0.0595 0.122322
153318 CAN00WG0 199706 -0.0337 0.045765
160980 CAN00WH0 199709 0.0757 0.079293
163569 CAN00WH0 199710 -0.0741 -0.044000
166159 CAN00WH0 199711 0.1000 -0.014644
168782 CAN00WH0 199712 -0.0909 -0.007072
171399 CAN00WH0 199801 -0.0100 0.001381
174022 CAN00WH0 199802 0.1919 0.081924
176637 CAN00WH0 199803 0.0085 0.050415
179255 CAN00WH0 199804 -0.0168 0.018393
181880 CAN00WH0 199805 0.0427 -0.051279
184516 CAN00WH0 199806 -0.0656 -0.011516
143275 CAN00WO0 199702 -0.1176 -0.025442
145755 CAN00WO0 199703 -0.0074 -0.049279
148247 CAN00WO0 199704 -0.0075 -0.005948
150775 CAN00WO0 199705 0.0451 0.122322'''), sep='\s+')
df['beta'] = pd.rolling_cov(df['ret_1m'], df['mkt_ret_1m'], window=6) / pd.rolling_var(df['mkt_ret_1m'], window=6)
print df
Output:
输出:
id period ret_1m mkt_ret_1m beta
131146 CAN00WG0 199609 -0.1538 0.047104 NaN
133530 CAN00WG0 199610 -0.0455 -0.014143 NaN
135913 CAN00WG0 199611 0.0000 0.040926 NaN
138334 CAN00WG0 199612 0.2952 0.008723 NaN
140794 CAN00WG0 199701 -0.0257 0.039916 NaN
143274 CAN00WG0 199702 -0.0038 -0.025442 -1.245908
145754 CAN00WG0 199703 -0.2992 -0.049279 2.574464
148246 CAN00WG0 199704 -0.0919 -0.005948 2.657887
150774 CAN00WG0 199705 0.0595 0.122322 1.371090
153318 CAN00WG0 199706 -0.0337 0.045765 1.494095
160980 CAN00WH0 199709 0.0757 0.079293 1.616520
163569 CAN00WH0 199710 -0.0741 -0.044000 1.630411
166159 CAN00WH0 199711 0.1000 -0.014644 0.651220
168782 CAN00WH0 199712 -0.0909 -0.007072 0.652148
171399 CAN00WH0 199801 -0.0100 0.001381 0.724120
174022 CAN00WH0 199802 0.1919 0.081924 1.542782
176637 CAN00WH0 199803 0.0085 0.050415 1.605407
179255 CAN00WH0 199804 -0.0168 0.018393 1.571015
181880 CAN00WH0 199805 0.0427 -0.051279 1.139972
184516 CAN00WH0 199806 -0.0656 -0.011516 1.101890
143275 CAN00WO0 199702 -0.1176 -0.025442 1.372437
145755 CAN00WO0 199703 -0.0074 -0.049279 0.031939
148247 CAN00WO0 199704 -0.0075 -0.005948 -0.535855
150775 CAN00WO0 199705 0.0451 0.122322 0.341747