Python Pandas:如何根据现有列的多个条件分配值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30631841/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:44:15  来源:igfitidea点击:

Pandas: How do I assign values based on multiple conditions for existing columns?

pythonpandas

提问by Eric

I would like to create a new column with a numerical value based on the following conditions:

我想根据以下条件创建一个带有数值的新列:

a. if gender is male & pet1=pet2, points = 5

一种。如果性别是男性 & pet1=pet2,点数 = 5

b. if gender is female & (pet1 is 'cat' or pet1='dog'), points = 5

湾 如果性别是女性 &(pet1 是 'cat' 或 pet1='dog'),点 = 5

c. all other combinations, points = 0

C。所有其他组合,点数 = 0

    gender    pet1      pet2
0   male      dog       dog
1   male      cat       cat
2   male      dog       cat
3   female    cat       squirrel
4   female    dog       dog
5   female    squirrel  cat
6   squirrel  dog       cat

I would like the end result to be as follows:

我希望最终结果如下:

    gender    pet1      pet2      points
0   male      dog       dog       5
1   male      cat       cat       5
2   male      dog       cat       0
3   female    cat       squirrel  5
4   female    dog       dog       5
5   female    squirrel  cat       0
6   squirrel  dog       cat       0

How do I accomplish this?

我该如何实现?

采纳答案by EdChum

You can do this using np.where, the conditions use bitwise &and |for andand orwith parentheses around the multiple conditions due to operator precedence. So where the condition is true 5is returned and 0otherwise:

由于运算符优先级np.where,您可以使用,条件使用按位&|forandor在多个条件周围使用括号。因此,条件为真时5返回,0否则:

In [29]:
df['points'] = np.where( ( (df['gender'] == 'male') & (df['pet1'] == df['pet2'] ) ) | ( (df['gender'] == 'female') & (df['pet1'].isin(['cat','dog'] ) ) ), 5, 0)
df

Out[29]:
     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0

回答by Ruggero Turra

using apply.

使用apply

def f(x):
  if x['gender'] == 'male' and x['pet1'] == x['pet2']: return 5
  elif x['gender'] == 'female' and (x['pet1'] == 'cat' or x['pet1'] == 'dog'): return 5
  else: return 0

data['points'] = data.apply(f, axis=1)

回答by leonard

The apply method described by @RuggeroTurra takes a lot longer for 500k rows. I ended up using something like

@RuggeroTurra 描述的 apply 方法对于 500k 行需要更长的时间。我最终使用了类似的东西

df['result'] = ((df.a == 0) & (df.b != 1)).astype(int) * 2 + \
               ((df.a != 0) & (df.b != 1)).astype(int) * 3 + \
               ((df.a == 0) & (df.b == 1)).astype(int) * 4 + \
               ((df.a != 0) & (df.b == 1)).astype(int) * 5 

where the apply method took 25 seconds and this method above took about 18ms.

其中 apply 方法需要 25 秒,上面的方法需要大约 18 毫秒。

回答by Erfan

numpy.select

numpy.select

2020 answer

2020 答案

This is a perfect case for np.selectwhere we can create a column based on multiple conditions and it's a readable method when there are more conditions:

这是一个完美的例子np.select,我们可以根据多个条件创建一个列,当有更多条件时,这是一种可读的方法:

conditions = [
    df['gender'].eq('male') & df['pet1'].eq(df['pet2']),
    df['gender'].eq('female') & df['pet1'].isin(['cat', 'dog'])
]

df['points'] = np.select(conditions, [5,5], default=0)

print(df)
     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0

回答by George Pipis

You can also use the applyfunction. For example:

您也可以使用该apply功能。例如:

def myfunc(gender, pet1, pet2):
    if gender=='male' and pet1==pet2:
        myvalue=5
    elif gender=='female' and (pet1=='cat' or pet1=='dog'):
        myvalue=5
    else:
        myvalue=0
    return myvalue

And then using the apply function by setting axis=1

然后通过设置使用应用功能 axis=1

df['points'] = df.apply(lambda x: myfunc(x['gender'], x['pet1'], x['pet2']), axis=1)

We get:

我们得到:

     gender      pet1      pet2  points
0      male       dog       dog       5
1      male       cat       cat       5
2      male       dog       cat       0
3    female       cat  squirrel       5
4    female       dog       dog       5
5    female  squirrel       cat       0
6  squirrel       dog       cat       0