使用rolling_apply for pandas的Python自定义函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21025821/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python custom function using rolling_apply for pandas
提问by h.l.m
I would like to use the pandas.rolling_applyfunction to apply my own custom function on a rolling window basis.
我想使用该pandas.rolling_apply函数在滚动窗口的基础上应用我自己的自定义函数。
but my function requires two arguments, and also has two outputs. Is this possible?
但我的函数需要两个参数,并且还有两个输出。这可能吗?
Below is a minimum reproducible example...
下面是一个最小可重现的例子......
import pandas as pd
import numpy as np
import random
tmp = pd.DataFrame(np.random.randn(2000,2)/10000,
index=pd.date_range('2001-01-01',periods=2000),
columns=['A','B'])
def gm(df,p):
v =(((df+1).cumprod())-1)*p
return v.iloc[-1]
# an example output when subsetting for just 2001
gm(tmp['2001'],5)
# the aim is to do it on a rolling basis over a 50 day window
# whilst also getting both outputs and also allows me to add in the parameter p=5
# or any other number I want p to be...
pd.rolling_apply(tmp,50,gm)
which leads to an error...since gm takes two arguments...
这会导致错误...因为 gm 需要两个参数...
any help would be greatly appreciated...
任何帮助将不胜感激...
EDIT
编辑
Following Jeff's comment I have progressed, but am still struggling with two or more column outputs, so if instead i make a new function (below) which just returns two random numbers (unconnected to the previous calculation) instead rather than the last rows of v, I get an error of TypeError: only length-1 arrays can be converted to Python scalars. This function works if
按照杰夫的评论,我已经取得了进展,但仍然在为两个或更多列输出而苦苦挣扎,所以如果我创建一个新函数(如下),它只返回两个随机数(与之前的计算无关)而不是 v 的最后几行,我得到一个错误TypeError: only length-1 arrays can be converted to Python scalars。此功能有效,如果
def gm2(df,p):
df = pd.DataFrame(df)
v =(((df+1).cumprod())-1)*p
return np.random.rand(2)
pd.rolling_apply(tmp,50,lambda x: gm2(x,5)).tail(20)
This function works if 2 is changed to 1...
如果 2 更改为 1,则此功能有效...
采纳答案by Jeff
rolling_applypasses numpy arrays to the applied function (at-the-moment), by 0.14 it should pass a frame. The issue is here
rolling_apply将 numpy 数组传递给应用函数(此刻),到 0.14 时,它应该传递一个帧。问题在这里
So redefine your function to work on a numpy array. (You can of course construct a DataFrame inside here, but your index/column names won't be the same).
因此,重新定义您的函数以处理 numpy 数组。(您当然可以在此处构造一个 DataFrame,但您的索引/列名称将不相同)。
In [9]: def gm(df,p):
...: v = ((np.cumprod(df+1))-1)*p
...: return v[-1]
...:
If you wanted to use more of pandas functions in your custom function, do this (note that the indicies of the calling frame are notpassed ATM).
如果您想在自定义函数中使用更多的 Pandas 函数,请执行此操作(注意调用帧的索引不会通过 ATM)。
def gm(arr,p):
df = DataFrame(arr)
v =(((df+1).cumprod())-1)*p
return v.iloc[-1]
Pass it thru a lambda
通过 lambda 传递它
In [11]: pd.rolling_apply(tmp,50,lambda x: gm(x,5)).tail(20)
Out[11]:
A B
2006-06-04 0.004207 -0.002112
2006-06-05 0.003880 -0.001598
2006-06-06 0.003809 -0.002228
2006-06-07 0.002840 -0.003938
2006-06-08 0.002855 -0.004921
2006-06-09 0.002450 -0.004614
2006-06-10 0.001809 -0.004409
2006-06-11 0.001445 -0.005959
2006-06-12 0.001297 -0.006831
2006-06-13 0.000869 -0.007878
2006-06-14 0.000359 -0.008102
2006-06-15 -0.000885 -0.007996
2006-06-16 -0.001838 -0.008230
2006-06-17 -0.003036 -0.008658
2006-06-18 -0.002280 -0.008552
2006-06-19 -0.001398 -0.007831
2006-06-20 -0.000648 -0.007828
2006-06-21 -0.000799 -0.007616
2006-06-22 -0.001096 -0.006740
2006-06-23 -0.001160 -0.006004
[20 rows x 2 columns]

