Python 如何计算 Pandas 滚动窗口中的波动率(标准差)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43284304/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:47:32  来源:igfitidea点击:

How to compute volatility (standard deviation) in rolling window in Pandas

pythonperformancepandasnumpy

提问by Thegamer23

I have a time series "Ser" and I want to compute volatilities (standard deviations) with a rolling window. My current code correctly does it in this form:

我有一个时间序列“Ser”,我想用滚动窗口计算波动率(标准差)。我当前的代码以这种形式正确执行:

w=10
for timestep in range(length):
    subSer=Ser[timestep:timestep+w]
    mean_i=np.mean(subSer)
    vol_i=(np.sum((subSer-mean_i)**2)/len(subSer))**0.5
    volList.append(w_i)

This seems to me very inefficient. Does Pandas have built-in functionality for doing something like this?

这在我看来非常低效。Pandas 是否具有执行此类操作的内置功能?

回答by Mad Physicist

It looks like you are looking for Series.rolling. You can apply the stdcalculations to the resulting object:

看起来您正在寻找Series.rolling. 您可以将std计算应用于结果对象:

roller = Ser.rolling(w)
volList = roller.std(ddof=0)

If you don't plan on using the rolling window object again, you can write a one-liner:

如果你不打算再次使用滚动窗口对象,你可以写一个单行:

volList = Ser.rolling(w).std(ddof=0)

Keep in mind that ddof=0is necessary in this case because the normalization of the standard deviation is by len(Ser)-ddof, and that ddofdefaults to 1in pandas.

请记住,ddof=0在这种情况下这是必要的,因为标准偏差的归一化是由len(Ser)-ddof,并且ddof默认为1在熊猫中。

回答by aaron

Typically, [finance-type] people quote volatility in annualized terms of percent changes in price.

通常,[金融类型] 的人以年化价格变动百分比来报价波动率。

Assuming you have daily prices in a dataframe dfand there are 252 trading days in a year, something like the following is probably what you want:

假设您在数据框中有每日价格,df并且一年中有 252 个交易日,则可能是您想要的类似以下内容:

df.pct_change().rolling(window_size).std()*(252**0.5)

df.pct_change().rolling(window_size).std()*(252**0.5)

回答by Divakar

Here's one NumPy approach -

这是一种 NumPy 方法 -

# From http://stackoverflow.com/a/14314054/3293881 by @Jaime
def moving_average(a, n=3) :
    ret = np.cumsum(a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n - 1:] / n

# From http://stackoverflow.com/a/40085052/3293881
def strided_app(a, L, S=1 ):  # Window len = L, Stride len/stepsize = S
    nrows = ((a.size-L)//S)+1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

def rolling_meansqdiff_numpy(a, w):
    A = strided_app(a, w)
    B = moving_average(a,w)
    subs = A-B[:,None]
    sums = np.einsum('ij,ij->i',subs,subs)
    return (sums/w)**0.5

Sample run -

样品运行 -

In [202]: Ser = pd.Series(np.random.randint(0,9,(20)))

In [203]: rolling_meansqdiff_loopy(Ser, w=10)
Out[203]: 
[2.6095976701399777,
 2.3000000000000003,
 2.118962010041709,
 2.022374841615669,
 1.746424919657298,
 1.7916472867168918,
 1.3000000000000003,
 1.7776388834631178,
 1.6852299546352716,
 1.6881943016134133,
 1.7578395831246945]

In [204]: rolling_meansqdiff_numpy(Ser.values, w=10)
Out[204]: 
array([ 2.60959767,  2.3       ,  2.11896201,  2.02237484,  1.74642492,
        1.79164729,  1.3       ,  1.77763888,  1.68522995,  1.6881943 ,
        1.75783958])

Runtime test

运行时测试

Loopy approach -

循环方法 -

def rolling_meansqdiff_loopy(Ser, w):
    length = Ser.shape[0]- w + 1
    volList= []
    for timestep in range(length):
        subSer=Ser[timestep:timestep+w]
        mean_i=np.mean(subSer)
        vol_i=(np.sum((subSer-mean_i)**2)/len(subSer))**0.5
        volList.append(vol_i)
    return volList

Timings -

时间——

In [223]: Ser = pd.Series(np.random.randint(0,9,(10000)))

In [224]: %timeit rolling_meansqdiff_loopy(Ser, w=10)
1 loops, best of 3: 2.63 s per loop

# @Mad Physicist's vectorized soln
In [225]: %timeit Ser.rolling(10).std(ddof=0)
1000 loops, best of 3: 380 μs per loop

In [226]: %timeit rolling_meansqdiff_numpy(Ser.values, w=10)
1000 loops, best of 3: 393 μs per loop

A speedup of close to 7000xthere with the two vectorized approaches over the loopy one!

7000x使用两种矢量化方法比循环方法更接近那里的加速!

回答by mcguip

"Volatility" is ambiguous even in a financial sense. The most commonly referenced type of volatility is realized volatilitywhich is the square root of realized variance. The key differences from the standard deviation of returns are:

即使在财务意义上,“波动性”也是模棱两可的。最常用的波动率类型是已实现波动率,它是已实现方差的平方根。与回报标准差的主要区别是:

  • Log returns (not simple returns) are used
  • The figure is annualized (usually assuming between 252 and 260 trading days per year)
  • In the case Variance Swaps, log returns are not demeaned
  • 使用日志返回(不是简单的返回)
  • 该数字按年计算(通常假设每年有 252 至 260 个交易日)
  • 在方差交换的情况下,对数回报不会贬低

There are a variety of methods for computing realized volatility; however, I have implemented the two most common below:

有多种计算实际波动率的方法;但是,我已经实现了以下两个最常见的:

import numpy as np

window = 21  # trading days in rolling window
dpy = 252  # trading days per year
ann_factor = days_per_year / window

df['log_rtn'] = np.log(df['price']).diff()

# Var Swap (returns are not demeaned)
df['real_var'] = np.square(df['log_rtn']).rolling(window).sum() * ann_factor
df['real_vol'] = np.sqrt(df['real_var'])

# Classical (returns are demeaned, dof=1)
df['real_var'] = df['log_rtn'].rolling(window).var() * ann_factor
df['real_vol'] = np.sqrt(df['real_var'])