在 Pandas 中的多列上应用自定义函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47372274/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:47:20  来源:igfitidea点击:

Apply custom function over multiple columns in pandas

pythonpandasfunctiondataframeapply

提问by eli

I am having trouble "applying" a custom function in Pandas. When I test the function, directly passing the values it works and correctly returns the response. However, when I attempt to pass the column values, I receive the error "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

我在 Pandas 中“应用”自定义函数时遇到问题。当我测试函数时,直接传递它工作的值并正确返回响应。但是,当我尝试传递列值时,收到错误“系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a。全部()。”

def feez (rides, plan):
    pmt4       = 200
    inc4       = 50  #number rides included
    min_rate4  = 4 

    if plan == "4 Plan":
        if rides > inc4:
            fee = ((rides - inc4) * min_rate4) + pmt4 
        else:
            fee = pmt4
        return (fee)
    else:
       return 0.1

df['fee'].apply(feez(df.total_rides, df.plan_name))

Passing the values directly works, i.e. feez (800, "4 Plan"), returns 3200

直接传值有效,即feez(800, "4 Plan"),返回3200

However, I receive errors when I try to apply the function above.

但是,当我尝试应用上述函数时收到错误。

I am a newbie and suspect my syntax is poorly written. Any ideas much appreciated. TIA. Eli

我是新手,怀疑我的语法写得不好。任何想法都非常感谢。TIA。伊莱

回答by cs95

applyis meant to work on one row at a time, so passing the entire column as you are doing so will not work. In these instances, it's best to use a lambda.

apply旨在一次处理一行,因此在您这样做时传递整列是行不通的。在这些情况下,最好使用lambda.

df['fee'] = df.apply(lambda x: feez(x['total_rides'], x['plan_name']), axis=1)

However, there are possibly faster ways to do this. One way is using np.vectorize. The other is using np.where.

但是,可能有更快的方法来做到这一点。一种方法是使用np.vectorize. 另一种是使用np.where.

Option 1
np.vectorize

选项1
np.vectorize

v = np.vectorize(feez)
df['fee'] = v(df.total_rides, df.plan_name)


Option 2
Nested np.where

选项 2
嵌套np.where

df['fee'] = np.where(
        df.plan_name == "4 Plan", 
        np.where(df.total_rides > inc4, (df.total_rides - inc4) * min_rate4) + pmt4, pmt4), 
        0.1
)