pandas 如何在python中编写一个以两个变量（列）为条件的lambda函数

Question

提问by seeiespi

I have a data set, df, with two variables, x and y. I want to write a function that does the following:

我有一个数据集 df，有两个变量 x 和 y。我想编写一个执行以下操作的函数：

x if x>100 and y<50 else y

x 如果 x>100 和 y<50 否则 y

I am used to doing data analysis in STATA so I'm relatively new to pandas for data analysis. If it helps, in stata it would look like:

我习惯在 STATA 中进行数据分析，所以我对 Pandas 进行数据分析比较陌生。如果有帮助，在 stata 中它看起来像：

replace x = cond(x>100 & y<50, x, y)

替换 x = cond(x>100 & y<50, x, y)

In other words, the function is conditional on two columns in df and will return a value from one variable or the other in each row depending on whether the condition is met.

换句话说，该函数以 df 中的两列为条件，并将根据是否满足条件从每一行中的一个变量或另一个变量返回一个值。

So far I have been creating new variables through new functions like:

到目前为止，我一直在通过新函数创建新变量，例如：

df.dummyVar = df.x.apply(lambda x: 1 if x>100 else 0)

Using StackOverflow and the documentation I have only been able to find how to apply a function dependent on a single variable to more than one column (using the axis option). Please help.

使用 StackOverflow 和文档，我只能找到如何将依赖于单个变量的函数应用于多列（使用轴选项）。请帮忙。

Answer 1

回答by EdChum

Use where:

使用where：

df['dummyVar '] = df['x'].where((df['x'] > 100) & (df['y'] < 50), df['y'])

This will be much faster than performing an apply operation as it is vectorised.

这将比执行应用操作快得多，因为它是矢量化的。

Answer 2

回答by James Mills

Like this:

像这样：

f = lambda x, y: x if x>100 and y<50 else y

Lambda(s) in Python are equivalent to a normal function definition.

Python 中的 Lambda 相当于普通的函数定义。

def f(x, y):
    return x if x>100 and y<50 else y

NB:The body of a Lambda must be a valid expression. This means you cannot use things like: returnfor example; a Lambda will return the last expression evaluated.

注意：Lambda 的主体必须是有效的表达式。这意味着您不能使用以下内容：return例如；Lambda 将返回最后一个评估的表达式。

For some good reading see:

对于一些好的阅读，请参阅：

Answer 3

回答by seeiespi

There's now an pretty easy way to do this. Just use apply on the dataset:

现在有一种非常简单的方法可以做到这一点。只需在数据集上使用 apply：

df['dummy'] = df.apply(lambda row: row['x'] if row['x'] > 100 and row['y'] < 50 else row['y'])

pandas 如何在python中编写一个以两个变量（列）为条件的lambda函数

提问by seeiespi

回答by EdChum

回答by James Mills

回答by seeiespi

相关推荐

最近更新

标签

pandas 如何在python中编写一个以两个变量（列）为条件的lambda函数

提问by seeiespi

回答by EdChum

回答by James Mills

回答by seeiespi

相关推荐

pandas 熊猫读取excel：不解析数字

pandas 计算 DataFrame 每一行中系列中项目的出现次数

pandas 两个 Series 对象的布尔比较

从 Pandas 数据框中删除 NaT 值

相关推荐

最近更新

标签