Python 如何计算 Pandas 滚动窗口中的波动率（标准差）

Question

提问by Thegamer23

I have a time series "Ser" and I want to compute volatilities (standard deviations) with a rolling window. My current code correctly does it in this form:

我有一个时间序列“Ser”，我想用滚动窗口计算波动率（标准差）。我当前的代码以这种形式正确执行：

w=10
for timestep in range(length):
    subSer=Ser[timestep:timestep+w]
    mean_i=np.mean(subSer)
    vol_i=(np.sum((subSer-mean_i)**2)/len(subSer))**0.5
    volList.append(w_i)

This seems to me very inefficient. Does Pandas have built-in functionality for doing something like this?

这在我看来非常低效。Pandas 是否具有执行此类操作的内置功能？

Answer 1

回答by Mad Physicist

It looks like you are looking for Series.rolling. You can apply the stdcalculations to the resulting object:

看起来您正在寻找Series.rolling. 您可以将std计算应用于结果对象：

roller = Ser.rolling(w)
volList = roller.std(ddof=0)

If you don't plan on using the rolling window object again, you can write a one-liner:

如果你不打算再次使用滚动窗口对象，你可以写一个单行：

volList = Ser.rolling(w).std(ddof=0)

Keep in mind that ddof=0is necessary in this case because the normalization of the standard deviation is by len(Ser)-ddof, and that ddofdefaults to 1in pandas.

请记住，ddof=0在这种情况下这是必要的，因为标准偏差的归一化是由len(Ser)-ddof，并且ddof默认为1在熊猫中。

Answer 2

回答by aaron

Typically, [finance-type] people quote volatility in annualized terms of percent changes in price.

通常，[金融类型] 的人以年化价格变动百分比来报价波动率。

Assuming you have daily prices in a dataframe dfand there are 252 trading days in a year, something like the following is probably what you want:

假设您在数据框中有每日价格，df并且一年中有 252 个交易日，则可能是您想要的类似以下内容：

df.pct_change().rolling(window_size).std()*(252**0.5)

Answer 3

回答by Divakar

Here's one NumPy approach -

这是一种 NumPy 方法 -

# From http://stackoverflow.com/a/14314054/3293881 by @Jaime
def moving_average(a, n=3) :
    ret = np.cumsum(a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n - 1:] / n

# From http://stackoverflow.com/a/40085052/3293881
def strided_app(a, L, S=1 ):  # Window len = L, Stride len/stepsize = S
    nrows = ((a.size-L)//S)+1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

def rolling_meansqdiff_numpy(a, w):
    A = strided_app(a, w)
    B = moving_average(a,w)
    subs = A-B[:,None]
    sums = np.einsum('ij,ij->i',subs,subs)
    return (sums/w)**0.5

Sample run -

样品运行 -

In [202]: Ser = pd.Series(np.random.randint(0,9,(20)))

In [203]: rolling_meansqdiff_loopy(Ser, w=10)
Out[203]: 
[2.6095976701399777,
 2.3000000000000003,
 2.118962010041709,
 2.022374841615669,
 1.746424919657298,
 1.7916472867168918,
 1.3000000000000003,
 1.7776388834631178,
 1.6852299546352716,
 1.6881943016134133,
 1.7578395831246945]

In [204]: rolling_meansqdiff_numpy(Ser.values, w=10)
Out[204]: 
array([ 2.60959767,  2.3       ,  2.11896201,  2.02237484,  1.74642492,
        1.79164729,  1.3       ,  1.77763888,  1.68522995,  1.6881943 ,
        1.75783958])

Runtime test

运行时测试

Loopy approach -

循环方法 -

def rolling_meansqdiff_loopy(Ser, w):
    length = Ser.shape[0]- w + 1
    volList= []
    for timestep in range(length):
        subSer=Ser[timestep:timestep+w]
        mean_i=np.mean(subSer)
        vol_i=(np.sum((subSer-mean_i)**2)/len(subSer))**0.5
        volList.append(vol_i)
    return volList

Timings -

时间——

In [223]: Ser = pd.Series(np.random.randint(0,9,(10000)))

In [224]: %timeit rolling_meansqdiff_loopy(Ser, w=10)
1 loops, best of 3: 2.63 s per loop

# @Mad Physicist's vectorized soln
In [225]: %timeit Ser.rolling(10).std(ddof=0)
1000 loops, best of 3: 380 μs per loop

In [226]: %timeit rolling_meansqdiff_numpy(Ser.values, w=10)
1000 loops, best of 3: 393 μs per loop

A speedup of close to 7000xthere with the two vectorized approaches over the loopy one!

7000x使用两种矢量化方法比循环方法更接近那里的加速！

Answer 4

回答by mcguip

"Volatility" is ambiguous even in a financial sense. The most commonly referenced type of volatility is realized volatilitywhich is the square root of realized variance. The key differences from the standard deviation of returns are:

即使在财务意义上，“波动性”也是模棱两可的。最常用的波动率类型是已实现波动率，它是已实现方差的平方根。与回报标准差的主要区别是：

Log returns (not simple returns) are used
The figure is annualized (usually assuming between 252 and 260 trading days per year)
In the case Variance Swaps, log returns are not demeaned

使用日志返回（不是简单的返回）
该数字按年计算（通常假设每年有 252 至 260 个交易日）
在方差交换的情况下，对数回报不会贬低

There are a variety of methods for computing realized volatility; however, I have implemented the two most common below:

有多种计算实际波动率的方法；但是，我已经实现了以下两个最常见的：

import numpy as np

window = 21  # trading days in rolling window
dpy = 252  # trading days per year
ann_factor = days_per_year / window

df['log_rtn'] = np.log(df['price']).diff()

# Var Swap (returns are not demeaned)
df['real_var'] = np.square(df['log_rtn']).rolling(window).sum() * ann_factor
df['real_vol'] = np.sqrt(df['real_var'])

# Classical (returns are demeaned, dof=1)
df['real_var'] = df['log_rtn'].rolling(window).var() * ann_factor
df['real_vol'] = np.sqrt(df['real_var'])

Python 如何计算 Pandas 滚动窗口中的波动率（标准差）

提问by Thegamer23

回答by Mad Physicist

回答by aaron

回答by Divakar

回答by mcguip

相关推荐

最近更新

标签

Python 如何计算 Pandas 滚动窗口中的波动率（标准差）

提问by Thegamer23

回答by Mad Physicist

回答by aaron

回答by Divakar

回答by mcguip

相关推荐

python: 无法打开文件 get-pip.py 错误 2] 没有这样的文件或目录

如何在 Python 中重命名 virtualenv？

Python 使用 Pillow 将 png 转换为 jpeg

Python 如何运行提供特定路径的 http 服务器？

相关推荐

最近更新

标签