使用多个 If-else 创建 Pandas 变量

Question

提问by Baktaawar

Need help with Pandas multiple IF-ELSE statements. I have a test dataset (titanic) as follows:

需要 Pandas 多个 IF-ELSE 语句的帮助。我有一个测试数据集（泰坦尼克号）如下：

ID  Survived    Pclass  Name    Sex Age
1   0   3   Braund  male    22
2   1   1   Cumings, Mrs.   female  38
3   1   3   Heikkinen, Miss. Laina  female  26
4   1   1   Futrelle, Mrs.  female  35
5   0   3   Allen, Mr.  male    35
6   0   3   Moran, Mr.  male    
7   0   1   McCarthy, Mr.   male    54
8   0   3   Palsson, Master male    2

where Id is the passenger id. I want to create a new flag variable in this data frame which has the following rule:

其中 Id 是乘客 ID。我想在这个数据框中创建一个新的标志变量，它具有以下规则：

if Sex=="female" or (Pclass==1 and Age <18) then 1 else 0.

Now to do this I tried a few approaches. This is how I approached first:

现在要做到这一点，我尝试了一些方法。这是我第一次接触的方式：

df=pd.read_csv(data.csv)
for passenger_index,passenger in df.iterrows():
    if passenger['Sex']=="female" or (passenger['Pclass']==1 and passenger['Age']<18):
       df['Prediction']=1
    else:
       df['Prediction']=0

The problem with above code is that it creates a Prediction variable in df but with all values as 0.

上面代码的问题在于它在 df 中创建了一个预测变量，但所有值都为 0。

However if I use the same code but instead output it to a dictionary it gives the right answer as shown below:

但是，如果我使用相同的代码而是将其输出到字典，它会给出正确的答案，如下所示：

prediction={}
df=pd.read_csv(data.csv)
for passenger_index,passenger in df.iterrows():
    if passenger['Sex']=="female" or (passenger['Pclass']==1 and passenger['Age']<18):
       prediction[passenger['ID']=1
    else:
       prediction[passenger['ID']=0

This gives a dict prediction with keys as ID and values as 1 or 0 based on the above logic.

根据上述逻辑，这给出了一个以键为 ID、值为 1 或 0 的字典预测。

So why the df variable works wrongly?. I even tried by first defining a function and then calling it. Gave the same ans as first.

那么为什么 df 变量工作错误呢？我什至尝试先定义一个函数，然后再调用它。给出与第一相同的答案。

So, how can we do this in pandas?.

那么，我们如何在Pandas中做到这一点？

Secondly, I guess the same can be done if we can just use some multiple if-else statements. I know np.where but it is not allowing to add 'and' condition. So here is what I was trying:

其次，我想如果我们可以使用多个 if-else 语句，也可以做到同样的事情。我知道 np.where 但它不允许添加“和”条件。所以这就是我正在尝试的：

df['Prediction']=np.where(df['Sex']=="female",1,np.where((df['Pclass']==1 and df['Age']<18),1,0)

The above gave an error for 'and' keyword in where.

上面在 where 中给出了“and”关键字的错误。

So can someone help?. Solutions with multiple approache using np.where(simple if-else like) and using some function(applymap etc) or modifications to what I wrote earlier would be really appreciated.

所以有人可以帮忙吗？使用 np.where（简单的 if-else 之类）并使用一些函数（applymap 等）或对我之前编写的内容进行修改的多种方法的解决方案将不胜感激。

Also how do we do the same using some applymap or apply/map method of df?.

另外我们如何使用df的一些applymap或apply/map方法来做同样的事情？。

Answer 1

回答by unutbu

Instead of looping through the rows using df.iterrows(which is relatively slow), you can assign the desired values to the Predictioncolumn in one assignment:

df.iterrows您可以Prediction在一次分配中将所需的值分配给列，而不是使用循环遍历行（这相对较慢）：

In [27]: df['Prediction'] = ((df['Sex']=='female') | ((df['Pclass']==1) & (df['Age']<18))).astype('int')

In [29]: df['Prediction']
Out[29]: 
0    0
1    1
2    1
3    1
4    0
5    0
6    0
7    0
Name: Prediction, dtype: int32

For your first approach, remember that df['Prediction']represents an entire column of df, so df['Prediction']=1assigns the value 1 to each row in that column. Since df['Prediction']=0was the last assignment, the entire column ended up being filled with zeros.

对于您的第一种方法，请记住df['Prediction']代表的一整列df，因此df['Prediction']=1将值 1 分配给该列中的每一行。由于df['Prediction']=0是最后一次分配，整个列最终都被零填充。

For your second approach, note that you need to use &not andto perform an elementwiselogical-and operation on two NumPy arrays or Pandas NDFrames. Thus, you could use

对于您的第二种方法，请注意您需要使用&not对两个 NumPy 数组或 Pandas NDFrameand执行元素逻辑与运算。因此，您可以使用

In [32]: np.where(df['Sex']=='female', 1, np.where((df['Pclass']==1)&(df['Age']<18), 1, 0))
Out[32]: array([0, 1, 1, 1, 0, 0, 0, 0])

though I think it is much simpler to just use |for logical-or and &for logical-and:

尽管我认为仅|用于逻辑或和&逻辑与要简单得多：

In [34]: ((df['Sex']=='female') | ((df['Pclass']==1) & (df['Age']<18)))
Out[34]: 
0    False
1     True
2     True
3     True
4    False
5    False
6    False
7    False
dtype: bool

使用多个 If-else 创建 Pandas 变量

提问by Baktaawar

回答by unutbu

相关推荐

最近更新

标签

使用多个 If-else 创建 Pandas 变量

提问by Baktaawar

回答by unutbu

相关推荐

pandas 根据另一列中的值将值添加到熊猫数据框的一列

pandas 熊猫合并列，但不合并“关键”列

从 Pandas DataFrame 创建术语密度矩阵的有效方法

pandas 在 DataFrame 聚合后绘制特定列

相关推荐

最近更新

标签