在 Pandas 中的多列上应用自定义函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47372274/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Apply custom function over multiple columns in pandas
提问by eli
I am having trouble "applying" a custom function in Pandas. When I test the function, directly passing the values it works and correctly returns the response. However, when I attempt to pass the column values, I receive the error "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."
我在 Pandas 中“应用”自定义函数时遇到问题。当我测试函数时,直接传递它工作的值并正确返回响应。但是,当我尝试传递列值时,收到错误“系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a。全部()。”
def feez (rides, plan):
pmt4 = 200
inc4 = 50 #number rides included
min_rate4 = 4
if plan == "4 Plan":
if rides > inc4:
fee = ((rides - inc4) * min_rate4) + pmt4
else:
fee = pmt4
return (fee)
else:
return 0.1
df['fee'].apply(feez(df.total_rides, df.plan_name))
Passing the values directly works, i.e. feez (800, "4 Plan"), returns 3200
直接传值有效,即feez(800, "4 Plan"),返回3200
However, I receive errors when I try to apply the function above.
但是,当我尝试应用上述函数时收到错误。
I am a newbie and suspect my syntax is poorly written. Any ideas much appreciated. TIA. Eli
我是新手,怀疑我的语法写得不好。任何想法都非常感谢。TIA。伊莱
回答by cs95
apply
is meant to work on one row at a time, so passing the entire column as you are doing so will not work. In these instances, it's best to use a lambda
.
apply
旨在一次处理一行,因此在您这样做时传递整列是行不通的。在这些情况下,最好使用lambda
.
df['fee'] = df.apply(lambda x: feez(x['total_rides'], x['plan_name']), axis=1)
However, there are possibly faster ways to do this. One way is using np.vectorize
. The other is using np.where
.
但是,可能有更快的方法来做到这一点。一种方法是使用np.vectorize
. 另一种是使用np.where
.
Option 1np.vectorize
选项1np.vectorize
v = np.vectorize(feez)
df['fee'] = v(df.total_rides, df.plan_name)
Option 2
Nested np.where
选项 2
嵌套np.where
df['fee'] = np.where(
df.plan_name == "4 Plan",
np.where(df.total_rides > inc4, (df.total_rides - inc4) * min_rate4) + pmt4, pmt4),
0.1
)