为什么在使用 pandas apply 时会出现 AttributeError?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48052125/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:59:57  来源:igfitidea点击:

Why do I get an AttributeError when using pandas apply?

pythonpandasdataframeapplyattributeerror

提问by LPR

How should I convert NaN value into categorical value based on condition. I am getting error while trying to convert Nan value.

我应该如何根据条件将 NaN 值转换为分类值。尝试转换 Nan 值时出现错误。

category           gender     sub-category    title

health&beauty      NaN         makeup         lipbalm

health&beauty      women       makeup         lipstick

NaN                NaN         NaN            lipgloss

My DataFrame looks like this. And my function to convert NaN values in gender to categorical value looks like

我的 DataFrame 看起来像这样。我将性别中的 NaN 值转换为分类值的函数看起来像

def impute_gender(cols):
    category=cols[0]
    sub_category=cols[2]
    gender=cols[1]
    title=cols[3]
    if title.str.contains('Lip') and gender.isnull==True:
        return 'women'
df[['category','gender','sub_category','title']].apply(impute_gender,axis=1)

If I run the code I am getting error

如果我运行代码,我会收到错误

----> 7     if title.str.contains('Lip') and gender.isnull()==True:
      8         print(gender)
      9 

AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category')

Complete Dataset -https://github.com/lakshmipriya04/py-sample

完整数据集 - https://github.com/lakshmipriya04/py-sample

回答by cs95

Some things to note here -

这里需要注意的一些事情——

  1. If you're using only two columns, calling applyover 4 columns is wasteful
  2. Calling applyis wasteful in general, because it is slow and offers no vectorisation benefits to you
  3. In apply, you're dealing with scalars, so you do not use the .straccessor as you would a pd.Seriesobject. title.containswould be enough. Or more pythonically, "lip" in title.
  4. gender.isnullis completely wrong, genderis a scalar, it has no isnullattribute
  1. 如果你只使用两列,调用apply超过 4 列是浪费
  2. 调用apply通常是浪费的,因为它很慢并且没有为您提供矢量化的好处
  3. 在应用中,您正在处理标量,因此您不会像使用对象.str那样使用访问器pd.Seriestitle.contains就足够了。以上pythonically, "lip" in title
  4. gender.isnull完全错误,gender是标量,没有isnull属性


Option 1
np.where

选项1
np.where

m = df.gender.isnull() & df.title.str.contains('lip')
df['gender'] = np.where(m, 'women', df.gender)

df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

Which is not only fast, but simpler as well. If you're worried about case sensitivity, you can make your containscheck case insensitive -

这不仅速度快,而且更简单。如果你担心区分大小写,你可以让你的contains支票不区分大小写 -

m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)


Option 2
Another alternative is using pd.Series.mask/pd.Series.where-

选项 2
另一种选择是使用pd.Series.mask/ pd.Series.where-

df['gender'] = df.gender.mask(m, 'women')

Or,

或者,

df['gender'] = df.gender.where(~m, 'women')

df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

The maskimplicitly applies the new value to the column based on the mask provided.

mask隐式应用的新的价值基础上提供的面具列。

回答by Vaishali

Or simply use loc as an option 3 to @COLDSPEED's answer

或者简单地使用 loc 作为@COLDSPEED 答案的选项 3

cond = (df['gender'].isnull()) & (df['title'].str.contains('lip'))
df.loc[cond, 'gender'] = 'women'


    category        gender  sub-category    title
0   health&beauty   women   makeup          lipbalm
1   health&beauty   women   makeup          lipstick
2   NaN             women       NaN         lipgloss

回答by YOBEN_S

If we are due with NaN values , fillnacan be one of the method:-)

如果我们使用 NaN 值,fillna可以是其中一种方法:-)

df.gender=df.gender.fillna(df.title.str.contains('lip').replace(True,'women'))
df
Out[63]: 
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss