使用多个 If-else 创建 Pandas 变量

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22504329/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:49:36  来源:igfitidea点击:

Pandas variable creation using multiple If-else

pythonif-statementpandaswhere

提问by Baktaawar

Need help with Pandas multiple IF-ELSE statements. I have a test dataset (titanic) as follows:

需要 Pandas 多个 IF-ELSE 语句的帮助。我有一个测试数据集(泰坦尼克号)如下:

ID  Survived    Pclass  Name    Sex Age
1   0   3   Braund  male    22
2   1   1   Cumings, Mrs.   female  38
3   1   3   Heikkinen, Miss. Laina  female  26
4   1   1   Futrelle, Mrs.  female  35
5   0   3   Allen, Mr.  male    35
6   0   3   Moran, Mr.  male    
7   0   1   McCarthy, Mr.   male    54
8   0   3   Palsson, Master male    2

where Id is the passenger id. I want to create a new flag variable in this data frame which has the following rule:

其中 Id 是乘客 ID。我想在这个数据框中创建一个新的标志变量,它具有以下规则:

if Sex=="female" or (Pclass==1 and Age <18) then 1 else 0. 

Now to do this I tried a few approaches. This is how I approached first:

现在要做到这一点,我尝试了一些方法。这是我第一次接触的方式:

df=pd.read_csv(data.csv)
for passenger_index,passenger in df.iterrows():
    if passenger['Sex']=="female" or (passenger['Pclass']==1 and passenger['Age']<18):
       df['Prediction']=1
    else:
       df['Prediction']=0

The problem with above code is that it creates a Prediction variable in df but with all values as 0.

上面代码的问题在于它在 df 中创建了一个预测变量,但所有值都为 0。

However if I use the same code but instead output it to a dictionary it gives the right answer as shown below:

但是,如果我使用相同的代码而是将其输出到字典,它会给出正确的答案,如下所示:

prediction={}
df=pd.read_csv(data.csv)
for passenger_index,passenger in df.iterrows():
    if passenger['Sex']=="female" or (passenger['Pclass']==1 and passenger['Age']<18):
       prediction[passenger['ID']=1
    else:
       prediction[passenger['ID']=0

This gives a dict prediction with keys as ID and values as 1 or 0 based on the above logic.

根据上述逻辑,这给出了一个以键为 ID、值为 1 或 0 的字典预测。

So why the df variable works wrongly?. I even tried by first defining a function and then calling it. Gave the same ans as first.

那么为什么 df 变量工作错误呢?我什至尝试先定义一个函数,然后再调用它。给出与第一相同的答案。

So, how can we do this in pandas?.

那么,我们如何在Pandas中做到这一点?

Secondly, I guess the same can be done if we can just use some multiple if-else statements. I know np.where but it is not allowing to add 'and' condition. So here is what I was trying:

其次,我想如果我们可以使用多个 if-else 语句,也可以做到同样的事情。我知道 np.where 但它不允许添加“和”条件。所以这就是我正在尝试的:

df['Prediction']=np.where(df['Sex']=="female",1,np.where((df['Pclass']==1 and df['Age']<18),1,0)

The above gave an error for 'and' keyword in where.

上面在 where 中给出了“and”关键字的错误。

So can someone help?. Solutions with multiple approache using np.where(simple if-else like) and using some function(applymap etc) or modifications to what I wrote earlier would be really appreciated.

所以有人可以帮忙吗?使用 np.where(简单的 if-else 之类)并使用一些函数(applymap 等)或对我之前编写的内容进行修改的多种方法的解决方案将不胜感激。

Also how do we do the same using some applymap or apply/map method of df?.

另外我们如何使用df的一些applymap或apply/map方法来做同样的事情?。

回答by unutbu

Instead of looping through the rows using df.iterrows(which is relatively slow), you can assign the desired values to the Predictioncolumn in one assignment:

df.iterrows您可以Prediction在一次分配中将所需的值分配给列,而不是使用循环遍历行(这相对较慢):

In [27]: df['Prediction'] = ((df['Sex']=='female') | ((df['Pclass']==1) & (df['Age']<18))).astype('int')

In [29]: df['Prediction']
Out[29]: 
0    0
1    1
2    1
3    1
4    0
5    0
6    0
7    0
Name: Prediction, dtype: int32


For your first approach, remember that df['Prediction']represents an entire column of df, so df['Prediction']=1assigns the value 1 to each row in that column. Since df['Prediction']=0was the last assignment, the entire column ended up being filled with zeros.

对于您的第一种方法,请记住df['Prediction']代表 的一整列df,因此df['Prediction']=1将值 1 分配给该列中的每一行。由于df['Prediction']=0是最后一次分配,整个列最终都被零填充。

For your second approach, note that you need to use &not andto perform an elementwiselogical-and operation on two NumPy arrays or Pandas NDFrames. Thus, you could use

对于您的第二种方法,请注意您需要使用&not对两个 NumPy 数组或 Pandas NDFrameand执行元素逻辑与运算。因此,您可以使用

In [32]: np.where(df['Sex']=='female', 1, np.where((df['Pclass']==1)&(df['Age']<18), 1, 0))
Out[32]: array([0, 1, 1, 1, 0, 0, 0, 0])

though I think it is much simpler to just use |for logical-or and &for logical-and:

尽管我认为仅|用于逻辑或和&逻辑与要简单得多:

In [34]: ((df['Sex']=='female') | ((df['Pclass']==1) & (df['Age']<18)))
Out[34]: 
0    False
1     True
2     True
3     True
4    False
5    False
6    False
7    False
dtype: bool