为什么在使用 pandas apply 时会出现 AttributeError？

Question

提问by LPR

How should I convert NaN value into categorical value based on condition. I am getting error while trying to convert Nan value.

我应该如何根据条件将 NaN 值转换为分类值。尝试转换 Nan 值时出现错误。

category           gender     sub-category    title

health&beauty      NaN         makeup         lipbalm

health&beauty      women       makeup         lipstick

NaN                NaN         NaN            lipgloss

My DataFrame looks like this. And my function to convert NaN values in gender to categorical value looks like

我的 DataFrame 看起来像这样。我将性别中的 NaN 值转换为分类值的函数看起来像

def impute_gender(cols):
    category=cols[0]
    sub_category=cols[2]
    gender=cols[1]
    title=cols[3]
    if title.str.contains('Lip') and gender.isnull==True:
        return 'women'
df[['category','gender','sub_category','title']].apply(impute_gender,axis=1)

If I run the code I am getting error

如果我运行代码，我会收到错误

----> 7     if title.str.contains('Lip') and gender.isnull()==True:
      8         print(gender)
      9 

AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category')

Complete Dataset -https://github.com/lakshmipriya04/py-sample

完整数据集 - https://github.com/lakshmipriya04/py-sample

Answer 1

回答by cs95

Some things to note here -

这里需要注意的一些事情——

If you're using only two columns, calling applyover 4 columns is wasteful
Calling applyis wasteful in general, because it is slow and offers no vectorisation benefits to you
In apply, you're dealing with scalars, so you do not use the .straccessor as you would a pd.Seriesobject. title.containswould be enough. Or more pythonically, "lip" in title.
gender.isnullis completely wrong, genderis a scalar, it has no isnullattribute

如果你只使用两列，调用apply超过 4 列是浪费
调用apply通常是浪费的，因为它很慢并且没有为您提供矢量化的好处
在应用中，您正在处理标量，因此您不会像使用对象.str那样使用访问器pd.Series。title.contains就足够了。以上pythonically， "lip" in title。
gender.isnull完全错误，gender是标量，没有isnull属性

Option 1
np.where

选项1
np.where

m = df.gender.isnull() & df.title.str.contains('lip')
df['gender'] = np.where(m, 'women', df.gender)

df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

Which is not only fast, but simpler as well. If you're worried about case sensitivity, you can make your containscheck case insensitive -

这不仅速度快，而且更简单。如果你担心区分大小写，你可以让你的contains支票不区分大小写 -

m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)

Option 2
Another alternative is using pd.Series.mask/pd.Series.where-

选项 2
另一种选择是使用pd.Series.mask/ pd.Series.where-

df['gender'] = df.gender.mask(m, 'women')

Or,

或者，

df['gender'] = df.gender.where(~m, 'women')

df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

The maskimplicitly applies the new value to the column based on the mask provided.

在mask隐式应用的新的价值基础上提供的面具列。

Answer 2

回答by Vaishali

Or simply use loc as an option 3 to @COLDSPEED's answer

或者简单地使用 loc 作为@COLDSPEED 答案的选项 3

cond = (df['gender'].isnull()) & (df['title'].str.contains('lip'))
df.loc[cond, 'gender'] = 'women'


    category        gender  sub-category    title
0   health&beauty   women   makeup          lipbalm
1   health&beauty   women   makeup          lipstick
2   NaN             women       NaN         lipgloss

Answer 3

回答by YOBEN_S

If we are due with NaN values , fillnacan be one of the method:-)

如果我们使用 NaN 值，fillna可以是其中一种方法:-)

df.gender=df.gender.fillna(df.title.str.contains('lip').replace(True,'women'))
df
Out[63]: 
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

为什么在使用 pandas apply 时会出现 AttributeError？

提问by LPR

回答by cs95

回答by Vaishali

回答by YOBEN_S

相关推荐

最近更新

标签

为什么在使用 pandas apply 时会出现 AttributeError？

提问by LPR

回答by cs95

回答by Vaishali

回答by YOBEN_S

相关推荐

pandas 根据条件从数据框中删除行

pandas 从数据框中按索引删除行

列出 Pandas 数据框中的唯一值

pandas dataframe resample 聚合函数使用具有自定义函数的多列？

相关推荐

最近更新

标签