如何将 lambda 函数正确应用到 Pandas 数据框列中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37428218/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to properly apply a lambda function into a pandas data frame column
提问by Amani
I have a pandas data frame, sample
, with one of the columns called PR
to which am applying a lambda function as follows:
我有一个 Pandas 数据框,sample
其中一列被调用PR
,正在应用 lambda 函数,如下所示:
sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)
I then get the following syntax error message:
然后我收到以下语法错误消息:
sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)
^
SyntaxError: invalid syntax
What am I doing wrong?
我究竟做错了什么?
回答by jezrael
You need mask
:
你需要mask
:
sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
Another solution with loc
and boolean indexing
:
使用loc
和的另一种解决方案boolean indexing
:
sample.loc[sample['PR'] < 90, 'PR'] = np.nan
Sample:
样本:
import pandas as pd
import numpy as np
sample = pd.DataFrame({'PR':[10,100,40] })
print (sample)
PR
0 10
1 100
2 40
sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
print (sample)
PR
0 NaN
1 100.0
2 NaN
sample.loc[sample['PR'] < 90, 'PR'] = np.nan
print (sample)
PR
0 NaN
1 100.0
2 NaN
EDIT:
编辑:
Solution with apply
:
解决方案apply
:
sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x)
Timingslen(df)=300k
:
时间len(df)=300k
:
sample = pd.concat([sample]*100000).reset_index(drop=True)
In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x)
10 loops, best of 3: 102 ms per loop
In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan)
The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 3.71 ms per loop
回答by kali prasad deverasetti
You need to add else
in your lambda function. Because you are telling what to do in case your condition(here x < 90) is met, but you are not telling what to do in case the condition is not met.
您需要添加else
lambda 函数。因为您是在告诉在满足条件(此处 x < 90)的情况下该做什么,但您没有告诉在不满足条件的情况下该做什么。
sample['PR'] = sample['PR'].apply(lambda x: 'NaN' if x < 90 else x)