如何将 lambda 函数正确应用到 Pandas 数据框列中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37428218/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 15:42:44  来源:igfitidea点击:

How to properly apply a lambda function into a pandas data frame column

pandaslambda

提问by Amani

I have a pandas data frame, sample, with one of the columns called PRto which am applying a lambda function as follows:

我有一个 Pandas 数据框,sample其中一列被调用PR,正在应用 lambda 函数,如下所示:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)

I then get the following syntax error message:

然后我收到以下语法错误消息:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)
                                                         ^
SyntaxError: invalid syntax

What am I doing wrong?

我究竟做错了什么?

回答by jezrael

You need mask:

你需要mask

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)

Another solution with locand boolean indexing:

使用loc和的另一种解决方案boolean indexing

sample.loc[sample['PR'] < 90, 'PR'] = np.nan

Sample:

样本:

import pandas as pd
import numpy as np

sample = pd.DataFrame({'PR':[10,100,40] })
print (sample)
    PR
0   10
1  100
2   40

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
print (sample)
      PR
0    NaN
1  100.0
2    NaN
sample.loc[sample['PR'] < 90, 'PR'] = np.nan
print (sample)
      PR
0    NaN
1  100.0
2    NaN

EDIT:

编辑:

Solution with apply:

解决方案apply

sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x)

Timingslen(df)=300k:

时间len(df)=300k

sample = pd.concat([sample]*100000).reset_index(drop=True)

In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x)
10 loops, best of 3: 102 ms per loop

In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan)
The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 3.71 ms per loop

回答by kali prasad deverasetti

You need to add elsein your lambda function. Because you are telling what to do in case your condition(here x < 90) is met, but you are not telling what to do in case the condition is not met.

您需要添加elselambda 函数。因为您是在告诉在满足条件(此处 x < 90)的情况下该做什么,但您没有告诉在不满足条件的情况下该做什么。

sample['PR'] = sample['PR'].apply(lambda x: 'NaN' if x < 90 else x)