pandas 如何在python中编写一个以两个变量(列)为条件的lambda函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24790676/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:15:55  来源:igfitidea点击:

How to write a lambda function that is conditional on two variables (columns) in python

pythonlambdapandasconditionalmultiple-columns

提问by seeiespi

I have a data set, df, with two variables, x and y. I want to write a function that does the following:

我有一个数据集 df,有两个变量 x 和 y。我想编写一个执行以下操作的函数:

x if x>100 and y<50 else y

x 如果 x>100 和 y<50 否则 y

I am used to doing data analysis in STATA so I'm relatively new to pandas for data analysis. If it helps, in stata it would look like:

我习惯在 STATA 中进行数据分析,所以我对 Pandas 进行数据分析比较陌生。如果有帮助,在 stata 中它看起来像:

replace x = cond(x>100 & y<50, x, y)

替换 x = cond(x>100 & y<50, x, y)

In other words, the function is conditional on two columns in df and will return a value from one variable or the other in each row depending on whether the condition is met.

换句话说,该函数以 df 中的两列为条件,并将根据是否满足条件从每一行中的一个变量或另一个变量返回一个值。

So far I have been creating new variables through new functions like:

到目前为止,我一直在通过新函数创建新变量,例如:

df.dummyVar = df.x.apply(lambda x: 1 if x>100 else 0)

df.dummyVar = df.x.apply(lambda x: 1 if x>100 else 0)

Using StackOverflow and the documentation I have only been able to find how to apply a function dependent on a single variable to more than one column (using the axis option). Please help.

使用 StackOverflow 和文档,我只能找到如何将依赖于单个变量的函数应用于多列(使用轴选项)。请帮忙。

回答by EdChum

Use where:

使用where

df['dummyVar '] = df['x'].where((df['x'] > 100) & (df['y'] < 50), df['y'])

This will be much faster than performing an apply operation as it is vectorised.

这将比执行应用操作快得多,因为它是矢量化的。

回答by James Mills

Like this:

像这样:

f = lambda x, y: x if x>100 and y<50 else y

Lambda(s) in Python are equivalent to a normal function definition.

Python 中的 Lambda 相当于普通的函数定义。

def f(x, y):
    return x if x>100 and y<50 else y

NB:The body of a Lambda must be a valid expression. This means you cannot use things like: returnfor example; a Lambda will return the last expression evaluated.

注意:Lambda 的主体必须是有效的表达式。这意味着您不能使用以下内容:return例如;Lambda 将返回最后一个评估的表达式。

For some good reading see:

对于一些好的阅读,请参阅:

回答by seeiespi

There's now an pretty easy way to do this. Just use apply on the dataset:

现在有一种非常简单的方法可以做到这一点。只需在数据集上使用 apply:

df['dummy'] = df.apply(lambda row: row['x'] if row['x'] > 100 and row['y'] < 50 else row['y'])