Python pandas 使用滚动应用到 groupby 对象以矢量化方式计算机车车辆 beta

Question

提问by Carl

I have a large data frame, df, containing 4 columns:

我有一个大数据框 df，包含 4 列：

             id           period  ret_1m   mkt_ret_1m
131146       CAN00WG0     199609 -0.1538    0.047104
133530       CAN00WG0     199610 -0.0455   -0.014143
135913       CAN00WG0     199611  0.0000    0.040926
138334       CAN00WG0     199612  0.2952    0.008723
140794       CAN00WG0     199701 -0.0257    0.039916
143274       CAN00WG0     199702 -0.0038   -0.025442
145754       CAN00WG0     199703 -0.2992   -0.049279
148246       CAN00WG0     199704 -0.0919   -0.005948
150774       CAN00WG0     199705  0.0595    0.122322
153318       CAN00WG0     199706 -0.0337    0.045765

             id           period  ret_1m   mkt_ret_1m
160980       CAN00WH0     199709  0.0757    0.079293
163569       CAN00WH0     199710 -0.0741   -0.044000
166159       CAN00WH0     199711  0.1000   -0.014644
168782       CAN00WH0     199712 -0.0909   -0.007072
171399       CAN00WH0     199801 -0.0100    0.001381
174022       CAN00WH0     199802  0.1919    0.081924
176637       CAN00WH0     199803  0.0085    0.050415
179255       CAN00WH0     199804 -0.0168    0.018393
181880       CAN00WH0     199805  0.0427   -0.051279
184516       CAN00WH0     199806 -0.0656   -0.011516

             id           period  ret_1m   mkt_ret_1m
143275       CAN00WO0     199702 -0.1176   -0.025442
145755       CAN00WO0     199703 -0.0074   -0.049279
148247       CAN00WO0     199704 -0.0075   -0.005948
150775       CAN00WO0     199705  0.0451    0.122322

etc.

等等。

I am attempting to calculate a common financial measure, known as beta, using a function, that takes two of the columns, ret_1m, the monthly stock_return, and ret_1m_mkt, the market 1 month return for the same period (period_id). I want to apply a function (calc_beta) to calculate the 12-month result of this function on a 12 month rolling basis.

我正在尝试使用一个函数计算一个常见的财务指标，称为 beta，该函数采用两列，ret_1m，每月 stock_return 和 ret_1m_mkt，即同一时期 (period_id) 的市场 1 个月回报。我想应用一个函数 (calc_beta) 来计算这个函数在 12 个月滚动基础上的 12 个月结果。

To do this, I am creating a groupby object:

为此，我正在创建一个 groupby 对象：

grp = df.groupby('id')

What I would like to do is use something like:

我想做的是使用类似的东西：

period = 12
for stock, sub_df in grp:
    arg = sub_df[['ret_1m', 'mkt_ret_1m']]
    beta = pd.rolling_apply(arg, period, calc_beta, min_periods = period)

Now, here is the first problem. According to the documentation, pd.rolling_apply arg can be either a series or a data frame. However, it appears that the data frame I supply is converted into a numpy array that can only contain one column of data, rather than the two I have tried to supply. So my code below for calc_beta will not work, because I need to pass both the stock and market returns:

现在，这是第一个问题。根据文档， pd.rolling_apply arg 可以是系列或数据框。但是，我提供的数据框似乎被转换为一个只能包含一列数据的 numpy 数组，而不是我试图提供的两列数据。所以我下面的 calc_beta 代码将不起作用，因为我需要同时传递股票和市场回报：

def calc_beta(np_array)
    s = np_array[:,0] # stock returns are column zero from numpy array
    m = np_array[:,1] # market returns are column one from numpy array

    covariance = np.cov(s,m) # Calculate covariance between stock and market
    beta = covariance[0,1]/covariance[1,1]
return beta

So my questions are as follows, I think it makes sense to list them in this way:

所以我的问题如下，我认为这样列出它们是有道理的：

(i)  How can I pass a data frame/multiple series/numpy array with more than one column to calc_beta using rolling_apply?
(ii) How can I return more than one value (e.g. the beta) from the calc_beta function? 
(iii) Having calculated rolling quantities, how can I recombined with the original dataframe df so that I have the rolling quantities corresponding to the correct date in the period column?
(iv) Is there a better (vectorized) way of achieving this?  I have seen some similar questions using e.g. df.apply(pd.rolling_apply,period,??) but I did not understand how these worked.

I gather that rolling_apply previously was unable to handle data frames, but the documentations suggests that it is now able to do so. My pandas.versionis 0.16.1.

我收集到 rolling_apply 以前无法处理数据帧，但文档表明它现在能够这样做。我的Pandas 版本是 0.16.1。

Thanks for any help! I have lost 1.5 days trying to figure this out and am totally stumped.

谢谢你的帮助！我已经失去了 1.5 天试图弄清楚这一点，我完全被难住了。

Ultimately, what I want is something like this:

最终，我想要的是这样的：

             id           period  ret_1m   mkt_ret_1m  beta  other_quantities
131146       CAN00WG0     199609 -0.1538    0.047104  0.521  xxx
133530       CAN00WG0     199610 -0.0455   -0.014143  0.627  xxxx
135913       CAN00WG0     199611  0.0000    0.040926  0.341  xxx
138334       CAN00WG0     199612  0.2952    0.008723  0.567  xx
140794       CAN00WG0     199701 -0.0257    0.039916  0.4612 xxx
143274       CAN00WG0     199702 -0.0038   -0.025442  0.215  xxx
145754       CAN00WG0     199703 -0.2992   -0.049279  0.4678  xxx
148246       CAN00WG0     199704 -0.0919   -0.005948  -0.4225  xxx
150774       CAN00WG0     199705  0.0595    0.122322  0.780  xxx
153318       CAN00WG0     199706 -0.0337    0.045765  0.623  xxx

             id           period  ret_1m   mkt_ret_1m  beta  other_quantities
160980       CAN00WH0     199709  0.0757    0.079293  -0.913  xx
163569       CAN00WH0     199710 -0.0741   -0.044000  0.894  xxx
166159       CAN00WH0     199711  0.1000   -0.014644  0.563  xxx
168782       CAN00WH0     199712 -0.0909   -0.007072  0.734  xxx
171399       CAN00WH0     199801 -0.0100    0.001381  0.894  xxxx
174022       CAN00WH0     199802  0.1919    0.081924  0.789  xx
176637       CAN00WH0     199803  0.0085    0.050415  0.1563  xxxx
179255       CAN00WH0     199804 -0.0168    0.018393  -0.64  xxxx
181880       CAN00WH0     199805  0.0427   -0.051279  -0.742  xxx
184516       CAN00WH0     199806 -0.0656   -0.011516  0.925  xxx

             id           period  ret_1m   mkt_ret_1m  beta
143275       CAN00WO0     199702 -0.1176   -0.025442  -1.52  xx
145755       CAN00WO0     199703 -0.0074   -0.049279  -0.632  xxx
148247       CAN00WO0     199704 -0.0075   -0.005948  1.521  xx
150775       CAN00WO0     199705  0.0451    0.122322  0.0321  xxx

etc.

等等。

Answer 1

回答by Happy001

I guess pd.rolling_applydoesn't help in this case since it seems to me that it essentially only takes a Series(Even if a dataframe is passed, it's processing one column a time). But you can always write your own rolling_apply that takes a dataframe.

我想pd.rolling_apply在这种情况下没有帮助，因为在我看来它基本上只需要一个Series（即使传递了一个数据帧，它也一次处理一列）。但是您始终可以编写自己的带有数据帧的 rolling_apply。

import pandas as pd
import numpy as np
from StringIO import StringIO

df = pd.read_csv(StringIO('''              id  period  ret_1m  mkt_ret_1m
131146  CAN00WG0  199609 -0.1538    0.047104
133530  CAN00WG0  199610 -0.0455   -0.014143
135913  CAN00WG0  199611  0.0000    0.040926
138334  CAN00WG0  199612  0.2952    0.008723
140794  CAN00WG0  199701 -0.0257    0.039916
143274  CAN00WG0  199702 -0.0038   -0.025442
145754  CAN00WG0  199703 -0.2992   -0.049279
148246  CAN00WG0  199704 -0.0919   -0.005948
150774  CAN00WG0  199705  0.0595    0.122322
153318  CAN00WG0  199706 -0.0337    0.045765
160980  CAN00WH0  199709  0.0757    0.079293
163569  CAN00WH0  199710 -0.0741   -0.044000
166159  CAN00WH0  199711  0.1000   -0.014644
168782  CAN00WH0  199712 -0.0909   -0.007072
171399  CAN00WH0  199801 -0.0100    0.001381
174022  CAN00WH0  199802  0.1919    0.081924
176637  CAN00WH0  199803  0.0085    0.050415
179255  CAN00WH0  199804 -0.0168    0.018393
181880  CAN00WH0  199805  0.0427   -0.051279
184516  CAN00WH0  199806 -0.0656   -0.011516
143275  CAN00WO0  199702 -0.1176   -0.025442
145755  CAN00WO0  199703 -0.0074   -0.049279
148247  CAN00WO0  199704 -0.0075   -0.005948
150775  CAN00WO0  199705  0.0451    0.122322'''), sep='\s+')



def calc_beta(df):
    np_array = df.values
    s = np_array[:,0] # stock returns are column zero from numpy array
    m = np_array[:,1] # market returns are column one from numpy array

    covariance = np.cov(s,m) # Calculate covariance between stock and market
    beta = covariance[0,1]/covariance[1,1]
    return beta

def rolling_apply(df, period, func, min_periods=None):
    if min_periods is None:
        min_periods = period
    result = pd.Series(np.nan, index=df.index)

    for i in range(1, len(df)+1):
        sub_df = df.iloc[max(i-period, 0):i,:] #I edited here
        if len(sub_df) >= min_periods:
            idx = sub_df.index[-1]
            result[idx] = func(sub_df)
    return result

df['beta'] = np.nan
grp = df.groupby('id')
period = 6 #I'm using 6  to see some not NaN values, since sample data don't have longer than 12 groups
for stock, sub_df in grp:
    beta = rolling_apply(sub_df[['ret_1m','mkt_ret_1m']], period, calc_beta, min_periods = period)  
    beta.name = 'beta'
    df.update(beta)
print df

Output

输出

            id  period  ret_1m  mkt_ret_1m      beta
131146  CAN00WG0  199609 -0.1538    0.047104       NaN
133530  CAN00WG0  199610 -0.0455   -0.014143       NaN
135913  CAN00WG0  199611  0.0000    0.040926       NaN
138334  CAN00WG0  199612  0.2952    0.008723       NaN
140794  CAN00WG0  199701 -0.0257    0.039916       NaN
143274  CAN00WG0  199702 -0.0038   -0.025442 -1.245908
145754  CAN00WG0  199703 -0.2992   -0.049279  2.574464
148246  CAN00WG0  199704 -0.0919   -0.005948  2.657887
150774  CAN00WG0  199705  0.0595    0.122322  1.371090
153318  CAN00WG0  199706 -0.0337    0.045765  1.494095
...          ...     ...     ...         ...       ...
171399  CAN00WH0  199801 -0.0100    0.001381       NaN
174022  CAN00WH0  199802  0.1919    0.081924  1.542782
176637  CAN00WH0  199803  0.0085    0.050415  1.605407
179255  CAN00WH0  199804 -0.0168    0.018393  1.571015
181880  CAN00WH0  199805  0.0427   -0.051279  1.139972
184516  CAN00WH0  199806 -0.0656   -0.011516  1.101890
143275  CAN00WO0  199702 -0.1176   -0.025442       NaN
145755  CAN00WO0  199703 -0.0074   -0.049279       NaN
148247  CAN00WO0  199704 -0.0075   -0.005948       NaN
150775  CAN00WO0  199705  0.0451    0.122322       NaN

Answer 2

回答by vlmercado

Try pd.rolling_cov() and pd.rolling.var() as follows:

尝试 pd.rolling_cov() 和 pd.rolling.var() 如下：

import pandas as pd
import numpy as np
from StringIO import StringIO

    df = pd.read_csv(StringIO('''              id  period  ret_1m  mkt_ret_1m
    131146  CAN00WG0  199609 -0.1538    0.047104
    133530  CAN00WG0  199610 -0.0455   -0.014143
    135913  CAN00WG0  199611  0.0000    0.040926
    138334  CAN00WG0  199612  0.2952    0.008723
    140794  CAN00WG0  199701 -0.0257    0.039916
    143274  CAN00WG0  199702 -0.0038   -0.025442
    145754  CAN00WG0  199703 -0.2992   -0.049279
    148246  CAN00WG0  199704 -0.0919   -0.005948
    150774  CAN00WG0  199705  0.0595    0.122322
    153318  CAN00WG0  199706 -0.0337    0.045765
    160980  CAN00WH0  199709  0.0757    0.079293
    163569  CAN00WH0  199710 -0.0741   -0.044000
    166159  CAN00WH0  199711  0.1000   -0.014644
    168782  CAN00WH0  199712 -0.0909   -0.007072
    171399  CAN00WH0  199801 -0.0100    0.001381
    174022  CAN00WH0  199802  0.1919    0.081924
    176637  CAN00WH0  199803  0.0085    0.050415
    179255  CAN00WH0  199804 -0.0168    0.018393
    181880  CAN00WH0  199805  0.0427   -0.051279
    184516  CAN00WH0  199806 -0.0656   -0.011516
    143275  CAN00WO0  199702 -0.1176   -0.025442
    145755  CAN00WO0  199703 -0.0074   -0.049279
    148247  CAN00WO0  199704 -0.0075   -0.005948
    150775  CAN00WO0  199705  0.0451    0.122322'''), sep='\s+')

    df['beta'] = pd.rolling_cov(df['ret_1m'], df['mkt_ret_1m'], window=6) / pd.rolling_var(df['mkt_ret_1m'], window=6)

print df

Output:

输出：

              id  period  ret_1m  mkt_ret_1m      beta
131146  CAN00WG0  199609 -0.1538    0.047104       NaN
133530  CAN00WG0  199610 -0.0455   -0.014143       NaN
135913  CAN00WG0  199611  0.0000    0.040926       NaN
138334  CAN00WG0  199612  0.2952    0.008723       NaN
140794  CAN00WG0  199701 -0.0257    0.039916       NaN
143274  CAN00WG0  199702 -0.0038   -0.025442 -1.245908
145754  CAN00WG0  199703 -0.2992   -0.049279  2.574464
148246  CAN00WG0  199704 -0.0919   -0.005948  2.657887
150774  CAN00WG0  199705  0.0595    0.122322  1.371090
153318  CAN00WG0  199706 -0.0337    0.045765  1.494095
160980  CAN00WH0  199709  0.0757    0.079293  1.616520
163569  CAN00WH0  199710 -0.0741   -0.044000  1.630411
166159  CAN00WH0  199711  0.1000   -0.014644  0.651220
168782  CAN00WH0  199712 -0.0909   -0.007072  0.652148
171399  CAN00WH0  199801 -0.0100    0.001381  0.724120
174022  CAN00WH0  199802  0.1919    0.081924  1.542782
176637  CAN00WH0  199803  0.0085    0.050415  1.605407
179255  CAN00WH0  199804 -0.0168    0.018393  1.571015
181880  CAN00WH0  199805  0.0427   -0.051279  1.139972
184516  CAN00WH0  199806 -0.0656   -0.011516  1.101890
143275  CAN00WO0  199702 -0.1176   -0.025442  1.372437
145755  CAN00WO0  199703 -0.0074   -0.049279  0.031939
148247  CAN00WO0  199704 -0.0075   -0.005948 -0.535855
150775  CAN00WO0  199705  0.0451    0.122322  0.341747

Python pandas 使用滚动应用到 groupby 对象以矢量化方式计算机车车辆 beta

提问by Carl

回答by Happy001

回答by vlmercado

相关推荐

最近更新

标签

Python pandas 使用滚动应用到 groupby 对象以矢量化方式计算机车车辆 beta

提问by Carl

回答by Happy001

回答by vlmercado

相关推荐

通过 Python 中的 Pandas 将每日股票数据转换为每周股票数据

Pandas DataFrame 列分配 ValueError：传递的项目数错误

pandas Python 将类方法应用于数据框的行

pandas 在熊猫中将多行连接到一行

相关推荐

最近更新

标签