Pandas 相当于 SQL case when 语句创建新变量

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44921444/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:55:56  来源:igfitidea点击:

Pandas equivalent of SQL case when statement to create new variable

pythonpandas

提问by Antonio Serrano

I have this df:

我有这个 df:

data = np.array([[np.nan, 0], [2, 0], [np.nan, 1]])
df = pd.DataFrame(data=data, columns = ['a', 'b'])

which looks like this:

看起来像这样:

     a    b
    --------
0   NaN  0.0
1   2.0  0.0
2   NaN  1.0

My goalis to create a third column "c" that has a value of 1 when column "a" is equal to NaN and column "b" is equal to 0. "c" would be 0 otherwise. The simple SQL case statement would be:

我的目标是创建第三列“c”,当“a”列等于 NaN 且“b”列等于 0 时,该列的值为 1。否则“c”将为 0。简单的 SQL case 语句是:

(CASE WHEN a IS NULL AND b = 0 THEN 1 ELSE 0 END) AS C

The desired outputis this:

所述期望的输出是这样的:

     a    b   c
    -----------
0   NaN  0.0  1
1   2.0  0.0  0
2   NaN  1.0  0

My (wrong) try:

我的(错误)尝试:

df['c'] = np.where(df['a']==np.nan & df['b'] == 0, 1, 0)

Many thx.

许多谢谢。

回答by Zero

You're almost there, instead use np.where(df['a'].isnull() & (df['b'] == 0), 1, 0)for null check.

你快到了,而是np.where(df['a'].isnull() & (df['b'] == 0), 1, 0)用于空检查。

Alternatively,

或者,

In [258]: df['c'] = (df['a'].isnull() & (df['b'] == 0)).astype(int)

In [259]: df
Out[259]:
     a    b  c
0  NaN  0.0  1
1  2.0  0.0  0
2  NaN  1.0  0

回答by Akshay Kandul

You cant check series value is NaN using np.nan instead use series.isnull()

您不能使用 np.nan 检查系列值是否为 NaN 而是使用 series.isnull()

Below code gives desired output:

下面的代码给出了所需的输出:

df['c'] = np.where(df['a'].isnull() & np.array(df['b'] == 0),1,0)

回答by vishnu_tushaar

For more control on conditions use np.select. Very similar to case when, can be used to scale up multiple outputs.

要对条件进行更多控制,请使用 np.select。与 case when 非常相似,可用于放大多个输出。

df['c'] = np.select(
[
    (df['a'].isnull() & (df['b'] == 0))
], 
[
    1
], 
default=0 )

回答by Nic Scozzaro

My personal preference is to use pandas applyfunction with an if statement:

我个人的偏好是使用apply带有 if 语句的pandas函数:

df['c'] = df.apply(lambda x: (1 if np.isnan(x[0]) and x[1] == 0 else 0), axis=1)