Python Pandas:如何根据现有列的多个条件分配值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30631841/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: How do I assign values based on multiple conditions for existing columns?
提问by Eric
I would like to create a new column with a numerical value based on the following conditions:
我想根据以下条件创建一个带有数值的新列:
a. if gender is male & pet1=pet2, points = 5
一种。如果性别是男性 & pet1=pet2,点数 = 5
b. if gender is female & (pet1 is 'cat' or pet1='dog'), points = 5
湾 如果性别是女性 &(pet1 是 'cat' 或 pet1='dog'),点 = 5
c. all other combinations, points = 0
C。所有其他组合,点数 = 0
gender pet1 pet2
0 male dog dog
1 male cat cat
2 male dog cat
3 female cat squirrel
4 female dog dog
5 female squirrel cat
6 squirrel dog cat
I would like the end result to be as follows:
我希望最终结果如下:
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
How do I accomplish this?
我该如何实现?
采纳答案by EdChum
You can do this using np.where
, the conditions use bitwise &
and |
for and
and or
with parentheses around the multiple conditions due to operator precedence. So where the condition is true 5
is returned and 0
otherwise:
由于运算符优先级np.where
,您可以使用,条件使用按位&
和|
forand
并or
在多个条件周围使用括号。因此,条件为真时5
返回,0
否则:
In [29]:
df['points'] = np.where( ( (df['gender'] == 'male') & (df['pet1'] == df['pet2'] ) ) | ( (df['gender'] == 'female') & (df['pet1'].isin(['cat','dog'] ) ) ), 5, 0)
df
Out[29]:
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
回答by Ruggero Turra
回答by leonard
The apply method described by @RuggeroTurra takes a lot longer for 500k rows. I ended up using something like
@RuggeroTurra 描述的 apply 方法对于 500k 行需要更长的时间。我最终使用了类似的东西
df['result'] = ((df.a == 0) & (df.b != 1)).astype(int) * 2 + \
((df.a != 0) & (df.b != 1)).astype(int) * 3 + \
((df.a == 0) & (df.b == 1)).astype(int) * 4 + \
((df.a != 0) & (df.b == 1)).astype(int) * 5
where the apply method took 25 seconds and this method above took about 18ms.
其中 apply 方法需要 25 秒,上面的方法需要大约 18 毫秒。
回答by Erfan
numpy.select
numpy.select
2020 answer
2020 答案
This is a perfect case for np.select
where we can create a column based on multiple conditions and it's a readable method when there are more conditions:
这是一个完美的例子np.select
,我们可以根据多个条件创建一个列,当有更多条件时,这是一种可读的方法:
conditions = [
df['gender'].eq('male') & df['pet1'].eq(df['pet2']),
df['gender'].eq('female') & df['pet1'].isin(['cat', 'dog'])
]
df['points'] = np.select(conditions, [5,5], default=0)
print(df)
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0
回答by George Pipis
You can also use the apply
function. For example:
您也可以使用该apply
功能。例如:
def myfunc(gender, pet1, pet2):
if gender=='male' and pet1==pet2:
myvalue=5
elif gender=='female' and (pet1=='cat' or pet1=='dog'):
myvalue=5
else:
myvalue=0
return myvalue
And then using the apply function by setting axis=1
然后通过设置使用应用功能 axis=1
df['points'] = df.apply(lambda x: myfunc(x['gender'], x['pet1'], x['pet2']), axis=1)
We get:
我们得到:
gender pet1 pet2 points
0 male dog dog 5
1 male cat cat 5
2 male dog cat 0
3 female cat squirrel 5
4 female dog dog 5
5 female squirrel cat 0
6 squirrel dog cat 0