为什么在使用 pandas apply 时会出现 AttributeError?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48052125/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why do I get an AttributeError when using pandas apply?
提问by LPR
How should I convert NaN value into categorical value based on condition. I am getting error while trying to convert Nan value.
我应该如何根据条件将 NaN 值转换为分类值。尝试转换 Nan 值时出现错误。
category gender sub-category title
health&beauty NaN makeup lipbalm
health&beauty women makeup lipstick
NaN NaN NaN lipgloss
My DataFrame looks like this. And my function to convert NaN values in gender to categorical value looks like
我的 DataFrame 看起来像这样。我将性别中的 NaN 值转换为分类值的函数看起来像
def impute_gender(cols):
category=cols[0]
sub_category=cols[2]
gender=cols[1]
title=cols[3]
if title.str.contains('Lip') and gender.isnull==True:
return 'women'
df[['category','gender','sub_category','title']].apply(impute_gender,axis=1)
If I run the code I am getting error
如果我运行代码,我会收到错误
----> 7 if title.str.contains('Lip') and gender.isnull()==True:
8 print(gender)
9
AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category')
Complete Dataset -https://github.com/lakshmipriya04/py-sample
回答by cs95
Some things to note here -
这里需要注意的一些事情——
- If you're using only two columns, calling
apply
over 4 columns is wasteful - Calling
apply
is wasteful in general, because it is slow and offers no vectorisation benefits to you - In apply, you're dealing with scalars, so you do not use the
.str
accessor as you would apd.Series
object.title.contains
would be enough. Or more pythonically,"lip" in title
. gender.isnull
is completely wrong,gender
is a scalar, it has noisnull
attribute
- 如果你只使用两列,调用
apply
超过 4 列是浪费 - 调用
apply
通常是浪费的,因为它很慢并且没有为您提供矢量化的好处 - 在应用中,您正在处理标量,因此您不会像使用对象
.str
那样使用访问器pd.Series
。title.contains
就足够了。以上pythonically,"lip" in title
。 gender.isnull
完全错误,gender
是标量,没有isnull
属性
Option 1np.where
选项1np.where
m = df.gender.isnull() & df.title.str.contains('lip')
df['gender'] = np.where(m, 'women', df.gender)
df
category gender sub-category title
0 health&beauty women makeup lipbalm
1 health&beauty women makeup lipstick
2 NaN women NaN lipgloss
Which is not only fast, but simpler as well. If you're worried about case sensitivity, you can make your contains
check case insensitive -
这不仅速度快,而且更简单。如果你担心区分大小写,你可以让你的contains
支票不区分大小写 -
m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)
Option 2
Another alternative is using pd.Series.mask
/pd.Series.where
-
选项 2
另一种选择是使用pd.Series.mask
/ pd.Series.where
-
df['gender'] = df.gender.mask(m, 'women')
Or,
或者,
df['gender'] = df.gender.where(~m, 'women')
df
category gender sub-category title
0 health&beauty women makeup lipbalm
1 health&beauty women makeup lipstick
2 NaN women NaN lipgloss
The mask
implicitly applies the new value to the column based on the mask provided.
在mask
隐式应用的新的价值基础上提供的面具列。
回答by Vaishali
Or simply use loc as an option 3 to @COLDSPEED's answer
或者简单地使用 loc 作为@COLDSPEED 答案的选项 3
cond = (df['gender'].isnull()) & (df['title'].str.contains('lip'))
df.loc[cond, 'gender'] = 'women'
category gender sub-category title
0 health&beauty women makeup lipbalm
1 health&beauty women makeup lipstick
2 NaN women NaN lipgloss
回答by YOBEN_S
If we are due with NaN values , fillna
can be one of the method:-)
如果我们使用 NaN 值,fillna
可以是其中一种方法:-)
df.gender=df.gender.fillna(df.title.str.contains('lip').replace(True,'women'))
df
Out[63]:
category gender sub-category title
0 health&beauty women makeup lipbalm
1 health&beauty women makeup lipstick
2 NaN women NaN lipgloss