Python 根据 if-elif-else 条件创建新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21702342/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:26:46  来源:igfitidea点击:

Creating a new column based on if-elif-else condition

pythonpandasconditional

提问by nutship

I have a DataFrame df:

我有一个数据帧df

    A    B
a   2    2 
b   3    1
c   1    3

I want to create a new column based on the following criteria:

我想根据以下条件创建一个新列:

if row A == B: 0

如果行 A == B: 0

if rowA > B: 1

如果行A > B: 1

if row A < B: -1

如果行 A < B: -1

so given the above table, it should be:

所以鉴于上表,它应该是:

    A    B    C
a   2    2    0
b   3    1    1
c   1    3   -1 

For typical if elsecases I do np.where(df.A > df.B, 1, -1), does pandas provide a special syntax for solving my problem with one step (without the necessity of creating 3 new columns and then combining the result)?

对于if else我所做的典型情况np.where(df.A > df.B, 1, -1),pandas 是否提供了一种特殊的语法来一步解决我的问题(无需创建 3 个新列然后组合结果)?

采纳答案by Zelazny7

To formalize some of the approaches laid out above:

将上面列出的一些方法正式化:

Create a function that operates on the rows of your dataframe like so:

创建一个对数据框行进行操作的函数,如下所示:

def f(row):
    if row['A'] == row['B']:
        val = 0
    elif row['A'] > row['B']:
        val = 1
    else:
        val = -1
    return val

Then apply it to your dataframe passing in the axis=1option:

然后将其应用于传入axis=1选项的数据帧:

In [1]: df['C'] = df.apply(f, axis=1)

In [2]: df
Out[2]:
   A  B  C
a  2  2  0
b  3  1  1
c  1  3 -1

Of course, this is not vectorized so performance may not be as good when scaled to a large number of records. Still, I think it is much more readable. Especially coming from a SAS background.

当然,这不是矢量化的,因此在扩展到大量记录时性能可能不会那么好。不过,我认为它更具可读性。特别是来自 SAS 背景。

回答by DSM

For this particular relationship, you could use np.sign:

对于这种特殊关系,您可以使用np.sign

>>> df["C"] = np.sign(df.A - df.B)
>>> df
   A  B  C
a  2  2  0
b  3  1  1
c  1  3 -1

回答by Brian

df.loc[df['A'] == df['B'], 'C'] = 0
df.loc[df['A'] > df['B'], 'C'] = 1
df.loc[df['A'] < df['B'], 'C'] = -1

Easy to solve using indexing. The first line of code reads like so, if column Ais equal to column Bthen create and set column Cequal to 0.

使用索引很容易解决。第一行代码读起来像这样,如果列A等于列,B则创建列并将其设置为C等于 0。

回答by Ravi G

enter image description here

在此处输入图片说明

Lets say above one is your original dataframe and you want to add a new column 'old'

假设上面的一个是您的原始数据框,并且您想添加一个新列“旧”

If age greater than 50 then we consider as older=yes otherwise False

如果年龄大于 50,那么我们认为是 old=yes 否则为 False

step 1: Get the indexes of rows whose age greater than 50

row_indexes=df[df['age']>=50].index

step 2: Using .loc we can assign a new value to column

df.loc[row_indexes,'elderly']="yes"

步骤 1:获取年龄大于 50 的行的索引

row_indexes=df[df['age']>=50].index

步骤 2:使用 .loc 我们可以为列分配一个新值

df.loc[row_indexes,'elderly']="yes"

same for age below less than 50

row_indexes=df[df['age']<50].index

df[row_indexes,'elderly']="no"

50岁以下相同

row_indexes=df[df['age']<50].index

df[row_indexes,'elderly']="no"