pandas 使用数据帧数据调用函数会出错(无法将系列转换为 <class 'float'>)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30824867/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
calling function with dataframe data gives error (cannot convert the series to <class 'float'>)
提问by John
I have an option pricing model (very simple Black Scholes) that works fine with data in this fashion:
我有一个期权定价模型(非常简单的 Black Scholes),可以很好地以这种方式处理数据:
In [18]:
BS2(100.,100.,1.,.001,.3)
Out[18]:
11.96762435837207
the function is here:
功能在这里:
Black Sholes Function
def BS2(S,X,T,r,v):
d1 = (log(S/X)+(.001+v*v/2)*T)/(v*sqrt(T))
d2 = d1-v*sqrt(T)
return (S*CND(d1)-X*exp(-.001*T)*CND(d2))
I do not think it matters for this question, but BS2 calls this:
我认为这个问题并不重要,但 BS2 称之为:
Cumulative normal distribution function
def CND(X):
(a1,a2,a3,a4,a5) = (0.31938153, -0.356563782, 1.781477937,
-1.821255978, 1.330274429)
L = abs(X)
K = 1.0 / (1.0 + 0.2316419 * L)
w = 1.0 - 1.0 / sqrt(2*pi)*exp(-L*L/2.) * (a1*K + a2*K*K + a3*pow(K,3) +
a4*pow(K,4) + a5*pow(K,5))
if X<0:
w = 1.0-w
return w
I tried to modify the working BS function to accept data from a df but seem to have done something wrong:
我试图修改工作 BS 函数以接受来自 df 的数据,但似乎做错了什么:
def BS(df):
d1 = (log(S/X)+(.001+v*v/2)*T)/(v*sqrt(T))
d2 = d1-v*sqrt(T)
return pd.Series((S*CND(d1)-X*exp(-.001*T)*CND(d2)))
my data is very straight forward:
我的数据非常简单:
In [13]:
df
Out[13]:
S X T r v
0 100 100 1 0.001 0.3
1 50 50 1 0.001 0.3
and are all float64
并且都是 float64
In [14]:
df.dtypes
Out[14]:
S float64
X float64
T float64
r float64
v float64
dtype: object
I aslo tried assigning the df variables to a name before sending to BS2 (I did this way and without this assignment:
我也尝试在发送到 BS2 之前将 df 变量分配给一个名称(我这样做了,没有这个分配:
S=df['S']
X=df['X']
T=df['T']
r=df['r']
v=df['v']
at the risk of sending too much info, here is the error message:
冒着发送太多信息的风险,这里是错误消息:
In [18]:
BS(df)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-18-745e7dd0eb2c> in <module>()
----> 1 BS(df)
<ipython-input-17-b666a39cd530> in BS(df)
3 def BS(df):
4 CallPutFlag='c'
----> 5 d1 = (log(S/X)+(.001+v*v/2)*T)/(v*sqrt(T))
6 d2 = d1-v*sqrt(T)
7 cp = ((S*CND(d1)-X*exp(-.001*T)*CND(d2)))
C:\Users\camcompco\AppData\Roaming\Python\Python34\site- packages\pandas\core\series.py in wrapper(self)
74 return converter(self.iloc[0])
75 raise TypeError(
---> 76 "cannot convert the series to {0}".format(str(converter)))
77 return wrapper
78
TypeError: cannot convert the series to <class 'float'>
any assistance would be greatly appreciated.
任何帮助将不胜感激。
John
约翰
采纳答案by JonD
I think it would be easier to use dataframe.apply()
我认为使用 dataframe.apply() 会更容易
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.apply.html
then the syntax would be df.apply(func, axis = 1)to apply the function func to each row.
那么语法就是将df.apply(func, axis = 1)函数 func 应用到每一行。
The answer to this question is similar:
这个问题的答案是类似的:
Apply function to each row of pandas dataframe to create two new columns
回答by JohnE
@JonD's answer is good, but here's an alternate answer that will be faster if you dataframe has more than a few rows:
@JonD 的答案很好,但如果您的数据框有多于几行,这里有一个替代答案会更快:
from scipy.stats import norm
def BS2(df):
d1 = (np.log(df.S/df.X)+(.001+df.v*df.v/2)*df['T'])/(df.v*np.sqrt(df['T']))
d2 = d1-df.v*np.sqrt(df['T'])
return (df.S*norm.cdf(d1)-df.X*np.exp(-.001*df['T'])*norm.cdf(d2))
Changes:
变化:
- Main point is to vectorize the function. Syntax-wise the main change is to explicitly use numpy versions of
sqrt,log, andexp. Otherwise you don't have to change much because numpy/pandas support basic math operations in an elementwise manner. - Replaced user-written CND with
norm.cdffrom scipy. Much faster b/c built in functions are almost always as fast as possible. - This is minor, but I went with shortcut notation on
df.Xand others, butdf['T']needs to be written out sincedf.Twould be interpreted asdf.transpose(). I guess this is a good example of why you should avoid the shortcut notation but I'm lazy...
- 重点是对函数进行向量化。语法明智的主要变化是明确使用的numpy的版本
sqrt,log和exp。否则你不必改变太多,因为 numpy/pandas 以元素方式支持基本的数学运算。 - 用
norm.cdffrom scipy替换了用户编写的 CND 。更快的 b/c 内置函数几乎总是尽可能快。 - 这是次要的,但我使用了快捷符号 on
df.X和其他符号,但df['T']需要写出,因为df.T会被解释为df.transpose(). 我想这是一个很好的例子,说明为什么你应该避免使用快捷方式,但我很懒……
Btw, if you want even more speed, the next thing to try would be to do it in numpy rather than pandas. You could also check if others have already written Black-Scholes functions/libraries (probably, though I don't know anything about it).
顺便说一句,如果你想要更快的速度,接下来要尝试的是用 numpy 而不是 Pandas。您还可以检查其他人是否已经编写了 Black-Scholes 函数/库(可能,尽管我对此一无所知)。

