Python Pandas 数据框基于多个 if 语句添加一个字段
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21733893/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas dataframe add a field based on multiple if statements
提问by user3302483
I'm quite new to Python and Pandas so this might be an obvious question.
我对 Python 和 Pandas 很陌生,所以这可能是一个显而易见的问题。
I have a dataframe with ages listed in it. I want to create a new field with an age banding. I can use the lambda statement to capture a single if / else statement but I want to use multiple if's e.g. if age < 18 then 'under 18' elif age < 40 then 'under 40' else '>40'.
我有一个数据框,其中列出了年龄。我想创建一个带有年龄条带的新字段。我可以使用 lambda 语句来捕获单个 if / else 语句,但我想使用多个 if 的 eg if age < 18 then 'under 18' elif age < 40 then 'under 40' else '>40'。
I don't think I can do this using lambda but am not sure how to do it in a different way. I have this code so far:
我不认为我可以使用 lambda 来做到这一点,但我不确定如何以不同的方式做到这一点。到目前为止我有这个代码:
import pandas as pd
import numpy as n
d = {'Age' : pd.Series([36., 42., 6., 66., 38.]) }
df = pd.DataFrame(d)
df['Age_Group'] = df['Age'].map(lambda x: '<18' if x < 19 else '>18')
print(df)
采纳答案by Ryan G
The pandas DataFrame provides a nice querying ability.
pandas DataFrame 提供了很好的查询能力。
What you are trying to do can be done simply with:
您可以通过以下方式轻松完成:
# Set a default value
df['Age_Group'] = '<40'
# Set Age_Group value for all row indexes which Age are greater than 40
df['Age_Group'][df['Age'] > 40] = '>40'
# Set Age_Group value for all row indexes which Age are greater than 18 and < 40
df['Age_Group'][(df['Age'] > 18) & (df['Age'] < 40)] = '>18'
# Set Age_Group value for all row indexes which Age are less than 18
df['Age_Group'][df['Age'] < 18] = '<18'
The querying here is a powerful tool of the dataframe and will allow you to manipulate the DataFrame as you need.
此处的查询是数据框的强大工具,可让您根据需要操作数据框。
For more complex conditionals, you can specify multiple conditions by encapsulating each condition in parenthesis and separating them with a boolean operator ( eg. '&' or '|')
对于更复杂的条件,您可以通过将每个条件封装在括号中并用布尔运算符(例如“&”或“|”)分隔来指定多个条件
You can see this in work here for the second conditional statement for setting >18.
您可以在此处查看用于设置 >18 的第二个条件语句。
Edit:
编辑:
You can read more about indexing of DataFrame and conditionals:
您可以阅读有关 DataFrame 和条件索引的更多信息:
http://pandas.pydata.org/pandas-docs/dev/indexing.html#index-objects
http://pandas.pydata.org/pandas-docs/dev/indexing.html#index-objects
Edit:
编辑:
To see how it works:
看看它是如何工作的:
>>> d = {'Age' : pd.Series([36., 42., 6., 66., 38.]) }
>>> df = pd.DataFrame(d)
>>> df
Age
0 36
1 42
2 6
3 66
4 38
>>> df['Age_Group'] = '<40'
>>> df['Age_Group'][df['Age'] > 40] = '>40'
>>> df['Age_Group'][(df['Age'] > 18) & (df['Age'] < 40)] = '>18'
>>> df['Age_Group'][df['Age'] < 18] = '<18'
>>> df
Age Age_Group
0 36 >18
1 42 >40
2 6 <18
3 66 >40
4 38 >18
Edit:
编辑:
To see how to do this without the chaining [using EdChums approach].
看看如何在没有链接的情况下做到这一点 [使用 EdChums 方法]。
>>> df['Age_Group'] = '<40'
>>> df.loc[df['Age'] < 40,'Age_Group'] = '<40'
>>> df.loc[(df['Age'] > 18) & (df['Age'] < 40), 'Age_Group'] = '>18'
>>> df.loc[df['Age'] < 18,'Age_Group'] = '<18'
>>> df
Age Age_Group
0 36 >18
1 42 <40
2 6 <18
3 66 <40
4 38 >18
回答by S.Zuo
You can also do a nested np.where()
你也可以做一个嵌套的 np.where()
df['Age_group'] = np.where(df.Age<18, 'under 18',
np.where(df.Age<40,'under 40', '>40'))

