Pandas 高效的 VWAP 计算

Question

提问by Zhubarb

I have the below code, using which I can calculate the volume-weighted average price by three lines of Pandas code.

我有下面的代码，使用它我可以通过三行 Pandas 代码计算成交量加权平均价格。

import numpy as np
import pandas as pd
from pandas.io.data import DataReader
import datetime as dt

df = DataReader(['AAPL'], 'yahoo', dt.datetime(2013, 12, 30), dt.datetime(2014, 12, 30))
df['Cum_Vol'] = df['Volume'].cumsum()
df['Cum_Vol_Price'] = (df['Volume'] * (df['High'] + df['Low'] + df['Close'] ) /3).cumsum()
df['VWAP'] = df['Cum_Vol_Price'] / df['Cum_Vol']

I am trying to find a way to code this without using cumsum()as an exercise. I am trying to find a solution which gives the VWAPcolumn in one pass. I have tried the below line, using .apply(). The logic is there, but the issue is I am not able to store values in row n in order to use in row (n+1). How do you approach this in pandas- just use an external tuplet or dictionary for temporary storage of cumulative values?

我试图找到一种方法来编码这个而不cumsum()用作练习。我正在尝试找到一种解决方案，可以VWAP一次性提供该列。我已经尝试了下面的行，使用.apply(). 逻辑就在那里，但问题是我无法在第 n 行中存储值以便在第 (n+1) 行中使用。您如何解决这个问题pandas- 只需使用外部连音或字典来临时存储累积值？

df['Cum_Vol']= np.nan
df['Cum_Vol_Price'] = np.nan
# calculate running cumulatives by apply - assume df row index is 0 to N
df['Cum_Vol'] = df.apply(lambda x: df.iloc[x.name-1]['Cum_Vol'] + x['Volume'] if int(x.name)>0 else x['Volume'], axis=1)

Is there a one-pass solution to the above problem?

上述问题是否有一次性解决方案？

EDIT:

编辑：

My main motivation is to understand what is happening under the hood. So, it is mainly for exercise than any valid reason. I believe each cumsum on a Series of size N has time complexity N (?). So I was wondering, instead of running two separate cumsum's, can we calculate both in one pass - along the lines of this. Very happy to accept an answer to this - rather than working code.

我的主要动机是了解幕后发生的事情。所以，它主要是为了锻炼而不是任何正当理由。我相信一系列大小为 N 的 cumsum 的时间复杂度为 N (?)。所以我想知道，不是运行两个单独的 cumsum，我们可以一次计算两者 - 沿着this. 很高兴接受对此的答案 - 而不是工作代码。

Answer 1

回答by JohnE

Getting into one pass vs one line starts to get a little semantical. How about this for a distinction: you can do it with 1 line of pandas, 1 line of numpy, or several lines of numba.

进入一次通过与一行开始变得有点语义化。如何区分：你可以用 1 行 Pandas、1 行 numpy 或几行 numba 来做。

from numba import jit

df=pd.DataFrame( np.random.randn(10000,3), columns=['v','h','l'] )

df['vwap_pandas'] = (df.v*(df.h+df.l)/2).cumsum() / df.v.cumsum()

@jit
def vwap():
    tmp1 = np.zeros_like(v)
    tmp2 = np.zeros_like(v)
    for i in range(0,len(v)):
        tmp1[i] = tmp1[i-1] + v[i] * ( h[i] + l[i] ) / 2.
        tmp2[i] = tmp2[i-1] + v[i]
    return tmp1 / tmp2

v = df.v.values
h = df.h.values
l = df.l.values

df['vwap_numpy'] = np.cumsum(v*(h+l)/2) / np.cumsum(v)

df['vwap_numba'] = vwap()

Timings:

时间：

%timeit (df.v*(df.h+df.l)/2).cumsum() / df.v.cumsum()  # pandas
1000 loops, best of 3: 829 μs per loop

%timeit np.cumsum(v*(h+l)/2) / np.cumsum(v)            # numpy
10000 loops, best of 3: 165 μs per loop

%timeit vwap()                                         # numba
10000 loops, best of 3: 87.4 μs per loop

Answer 2

回答by Ran Aroussi

Quick Edit: Just wanted to thank John for the original post :)

快速编辑：只是想感谢约翰的原始帖子:)

You can get even faster results by @jit-ing numpy's version:

你可以通过@jit-ing numpy 的版本获得更快的结果：

@jit
def np_vwap():
    return np.cumsum(v*(h+l)/2) / np.cumsum(v)

This got me 50.9 μs per loopas opposed to 74.5 μs per loopusing the vwap version above.

这让我50.9 μs per loop没有74.5 μs per loop使用上面的 vwap 版本。

Pandas 高效的 VWAP 计算

提问by Zhubarb

回答by JohnE

回答by Ran Aroussi

相关推荐

最近更新

标签

Pandas 高效的 VWAP 计算

提问by Zhubarb

回答by JohnE

回答by Ran Aroussi

相关推荐

与 Pandas 一起命名日

pandas 删除熊猫系列中的空列表

Pandas DataFrame 浮点格式

pandas Python：回顾 n 天滚动标准差

相关推荐

最近更新

标签