pandas 在 DataFrame 对象上使用滚动应用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19121854/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:12:38  来源:igfitidea点击:

Using rolling_apply on a DataFrame object

pythonpandas

提问by nitin

I am trying to calculate Volume Weighted Average Price on a rolling basis.

我正在尝试以滚动方式计算成交量加权平均价格。

To do this, I have a function vwap that does this for me, like so:

为此,我有一个函数 vwap 为我执行此操作,如下所示:

def vwap(bars):
    return ((bars.Close*bars.Volume).sum()/bars.Volume.sum()).round(2)

When I try to use this function with rolling_apply, as shown, I get an error:

当我尝试将此函数与 rolling_apply 一起使用时,如图所示,出现错误:

import pandas.io.data as web
bars = web.DataReader('AAPL','yahoo')
print pandas.rolling_apply(bars,30,vwap)

AttributeError: 'numpy.ndarray' object has no attribute 'Close'

The error makes sense to me because the rolling_apply requires not DataSeries or a ndarray as an input and not a dataFrame.. the way I am doing it.

这个错误对我来说很有意义,因为 rolling_apply 不需要 DataSeries 或 ndarray 作为输入,而不是 dataFrame ......我这样做的方式。

Is there a way to use rolling_apply to a DataFrame to solve my problem?

有没有办法将rolling_apply 用于DataFrame 来解决我的问题?

采纳答案by Jeff

This is not directly enabled, but you can do it like this

这不是直接启用的,但你可以这样做

In [29]: bars
Out[29]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 942 entries, 2010-01-04 00:00:00 to 2013-09-30 00:00:00
Data columns (total 6 columns):
Open         942  non-null values
High         942  non-null values
Low          942  non-null values
Close        942  non-null values
Volume       942  non-null values
Adj Close    942  non-null values
dtypes: float64(5), int64(1)

window=30

In [30]: concat([ (Series(vwap(bars.iloc[i:i+window]),
                      index=[bars.index[i+window]])) for i in xrange(len(df)-window) ])
Out[30]: 
2010-02-17    203.21
2010-02-18    202.95
2010-02-19    202.64
2010-02-22    202.41
2010-02-23    202.19
2010-02-24    201.85
2010-02-25    201.65
2010-02-26    201.50
2010-03-01    201.31
2010-03-02    201.35
2010-03-03    201.42
2010-03-04    201.09
2010-03-05    200.95
2010-03-08    201.50
2010-03-09    202.02
...
2013-09-10    485.94
2013-09-11    487.38
2013-09-12    486.77
2013-09-13    487.23
2013-09-16    487.20
2013-09-17    486.09
2013-09-18    485.52
2013-09-19    485.30
2013-09-20    485.37
2013-09-23    484.87
2013-09-24    485.81
2013-09-25    486.41
2013-09-26    486.07
2013-09-27    485.30
2013-09-30    484.74
Length: 912

回答by mathtick

A cleaned up version for reference, hopefully got the indexing correct:

供参考的清理版本,希望索引正确:

def myrolling_apply(df, N, f, nn=1):
    ii = [int(x) for x in arange(0, df.shape[0] - N + 1, nn)]
    out = [f(df.iloc[i:(i + N)]) for i in ii]
    out = pandas.Series(out)
    out.index = df.index[N-1::nn]
    return(out)

回答by citynorman

Modified @mathtick's answer to include na_fill. Also note that your function fneeds to return a single value, this can't return a dataframe with multiple columns.

修改了@mathtick 的答案以包含na_fill. 另请注意,您的函数f需要返回单个值,这不能返回具有多列的数据框。

def rolling_apply_df(dfg, N, f, nn=1, na_fill=True):
    ii = [int(x) for x in np.arange(0, dfg.shape[0] - N + 1, nn)]
    out = [f(dfg.iloc[i:(i + N)]) for i in ii]
    if(na_fill):
        out = pd.Series(np.concatenate([np.repeat(np.nan, N-1),np.array(out)]))
        out.index = dfg.index[::nn]
    else:
        out = pd.Series(out)
        out.index = dfg.index[N-1::nn]
    return(out)