Python Pandas 数据框基于多个 if 语句添加一个字段

Question

提问by user3302483

I'm quite new to Python and Pandas so this might be an obvious question.

我对 Python 和 Pandas 很陌生，所以这可能是一个显而易见的问题。

I have a dataframe with ages listed in it. I want to create a new field with an age banding. I can use the lambda statement to capture a single if / else statement but I want to use multiple if's e.g. if age < 18 then 'under 18' elif age < 40 then 'under 40' else '>40'.

我有一个数据框，其中列出了年龄。我想创建一个带有年龄条带的新字段。我可以使用 lambda 语句来捕获单个 if / else 语句，但我想使用多个 if 的 eg if age < 18 then 'under 18' elif age < 40 then 'under 40' else '>40'。

I don't think I can do this using lambda but am not sure how to do it in a different way. I have this code so far:

我不认为我可以使用 lambda 来做到这一点，但我不确定如何以不同的方式做到这一点。到目前为止我有这个代码：

import pandas as pd
import numpy as n

d = {'Age' : pd.Series([36., 42., 6., 66., 38.]) }

df = pd.DataFrame(d)

df['Age_Group'] =  df['Age'].map(lambda x: '<18' if x < 19 else '>18')

print(df)

Answer 1

采纳答案by Ryan G

The pandas DataFrame provides a nice querying ability.

pandas DataFrame 提供了很好的查询能力。

What you are trying to do can be done simply with:

您可以通过以下方式轻松完成：

# Set a default value
df['Age_Group'] = '<40'
# Set Age_Group value for all row indexes which Age are greater than 40
df['Age_Group'][df['Age'] > 40] = '>40'
# Set Age_Group value for all row indexes which Age are greater than 18 and < 40
df['Age_Group'][(df['Age'] > 18) & (df['Age'] < 40)] = '>18'
# Set Age_Group value for all row indexes which Age are less than 18
df['Age_Group'][df['Age'] < 18] = '<18'

The querying here is a powerful tool of the dataframe and will allow you to manipulate the DataFrame as you need.

此处的查询是数据框的强大工具，可让您根据需要操作数据框。

For more complex conditionals, you can specify multiple conditions by encapsulating each condition in parenthesis and separating them with a boolean operator ( eg. '&' or '|')

对于更复杂的条件，您可以通过将每个条件封装在括号中并用布尔运算符（例如“&”或“|”）分隔来指定多个条件

You can see this in work here for the second conditional statement for setting >18.

您可以在此处查看用于设置 >18 的第二个条件语句。

Edit:

编辑：

You can read more about indexing of DataFrame and conditionals:

您可以阅读有关 DataFrame 和条件索引的更多信息：

http://pandas.pydata.org/pandas-docs/dev/indexing.html#index-objects

Edit:

编辑：

To see how it works:

看看它是如何工作的：

>>> d = {'Age' : pd.Series([36., 42., 6., 66., 38.]) }
>>> df = pd.DataFrame(d)
>>> df
   Age
0   36
1   42
2    6
3   66
4   38
>>> df['Age_Group'] = '<40'
>>> df['Age_Group'][df['Age'] > 40] = '>40'
>>> df['Age_Group'][(df['Age'] > 18) & (df['Age'] < 40)] = '>18'
>>> df['Age_Group'][df['Age'] < 18] = '<18'
>>> df
   Age Age_Group
0   36       >18
1   42       >40
2    6       <18
3   66       >40
4   38       >18

Edit:

编辑：

To see how to do this without the chaining [using EdChums approach].

看看如何在没有链接的情况下做到这一点 [使用 EdChums 方法]。

>>> df['Age_Group'] = '<40'
>>> df.loc[df['Age'] < 40,'Age_Group'] = '<40'
>>> df.loc[(df['Age'] > 18) & (df['Age'] < 40), 'Age_Group'] = '>18'
>>> df.loc[df['Age'] < 18,'Age_Group'] = '<18'
>>> df
   Age Age_Group
0   36       >18
1   42       <40
2    6       <18
3   66       <40
4   38       >18

Answer 2

回答by S.Zuo

You can also do a nested np.where()

你也可以做一个嵌套的 np.where()

df['Age_group'] = np.where(df.Age<18, 'under 18',
                           np.where(df.Age<40,'under 40', '>40'))

Python Pandas 数据框基于多个 if 语句添加一个字段

提问by user3302483

采纳答案by Ryan G

回答by S.Zuo

相关推荐

最近更新

标签

Python Pandas 数据框基于多个 if 语句添加一个字段

提问by user3302483

采纳答案by Ryan G

回答by S.Zuo

相关推荐

如何在 Python 中创建颜色渐变？

Python Matplotlib imshow/matshow 在绘图上显示值

Python 将百分比字符串转换为在 Pandas read_csv 中浮动

如何调试 Python 分段错误？

相关推荐

最近更新

标签