在 Python Pandas 中对年龄列进行分组/分类

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/52753613/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:04:29  来源:igfitidea点击:

Grouping / Categorising ages column in Python Pandas

pythonpandasdataframe

提问by Anand Siddharth

I have a dataframe say df. dfhas a column 'Ages'

我有一个数据框说dfdf有一列'Ages'

>>> df['Age']

>>> df['Age']

Age Data

年龄数据

I want to group this ages and create a new column something like this

我想对这个年龄分组并创建一个像这样的新列

If age >= 0 & age < 2 then AgeGroup = Infant
If age >= 2 & age < 4 then AgeGroup = Toddler
If age >= 4 & age < 13 then AgeGroup = Kid
If age >= 13 & age < 20 then AgeGroup = Teen
and so on .....

How can I achieve this using Pandas library.

如何使用 Pandas 库实现这一点。

I tried doing this something like this

我试着做这样的事情

X_train_data['AgeGroup'][ X_train_data.Age < 13 ] = 'Kid'
X_train_data['AgeGroup'][ X_train_data.Age < 3 ] = 'Toddler'
X_train_data['AgeGroup'][ X_train_data.Age < 1 ] = 'Infant'

but doing this i get this warning

但这样做我得到这个警告

/Users/Anand/miniconda3/envs/learn/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copyThis is separate from the ipykernel package so we can avoid doing imports until /Users/Anand/miniconda3/envs/learn/lib/python3.7/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

/Users/Anand/miniconda3/envs/learn/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 正在尝试在 DataFrame 的切片副本上设置值 请参阅文档:http: //pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy 这与 ipykernel 包分开,因此我们可以避免在 /Users/Anand/miniconda3/ 之前进行导入envs/learn/lib/python3.7/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: 试图在 DataFrame 的切片副本上设置一个值

How to avoid this warning and do it in a better way.

如何避免此警告并以更好的方式做到这一点。

回答by jezrael

Use pandas.cutwith parameter right=Falsefor not includes the rightmost edge of bins:

使用pandas.cut与参数right=False不包括箱的最右边:

X_train_data = pd.DataFrame({'Age':[0,2,4,13,35,-1,54]})

bins= [0,2,4,13,20,110]
labels = ['Infant','Toddler','Kid','Teen','Adult']
X_train_data['AgeGroup'] = pd.cut(X_train_data['Age'], bins=bins, labels=labels, right=False)
print (X_train_data)
   Age AgeGroup
0    0   Infant
1    2  Toddler
2    4      Kid
3   13     Teen
4   35    Adult
5   -1      NaN
6   54    Adult

Last for replace missing value use add_categorieswith fillna:

最后的替代缺失值使用add_categories具有fillna

X_train_data['AgeGroup'] = X_train_data['AgeGroup'].cat.add_categories('unknown')
                                                   .fillna('unknown')
print (X_train_data)
   Age AgeGroup
0    0   Infant
1    2  Toddler
2    4      Kid
3   13     Teen
4   35    Adult
5   -1  unknown
6   54    Adult


bins= [-1,0,2,4,13,20, 110]
labels = ['unknown','Infant','Toddler','Kid','Teen', 'Adult']
X_train_data['AgeGroup'] = pd.cut(X_train_data['Age'], bins=bins, labels=labels, right=False)

print (X_train_data)
   Age AgeGroup
0    0   Infant
1    2  Toddler
2    4      Kid
3   13     Teen
4   35    Adult
5   -1  unknown
6   54    Adult

回答by quest

Just use:

只需使用:

X_train_data.loc[(X_train_data.Age < 13),  'AgeGroup'] = 'Kid'