Pandas 相当于 SQL case when 语句创建新变量
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44921444/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas equivalent of SQL case when statement to create new variable
提问by Antonio Serrano
I have this df:
我有这个 df:
data = np.array([[np.nan, 0], [2, 0], [np.nan, 1]])
df = pd.DataFrame(data=data, columns = ['a', 'b'])
which looks like this:
看起来像这样:
a b
--------
0 NaN 0.0
1 2.0 0.0
2 NaN 1.0
My goalis to create a third column "c" that has a value of 1 when column "a" is equal to NaN and column "b" is equal to 0. "c" would be 0 otherwise. The simple SQL case statement would be:
我的目标是创建第三列“c”,当“a”列等于 NaN 且“b”列等于 0 时,该列的值为 1。否则“c”将为 0。简单的 SQL case 语句是:
(CASE WHEN a IS NULL AND b = 0 THEN 1 ELSE 0 END) AS C
The desired outputis this:
所述期望的输出是这样的:
a b c
-----------
0 NaN 0.0 1
1 2.0 0.0 0
2 NaN 1.0 0
My (wrong) try:
我的(错误)尝试:
df['c'] = np.where(df['a']==np.nan & df['b'] == 0, 1, 0)
Many thx.
许多谢谢。
回答by Zero
You're almost there, instead use np.where(df['a'].isnull() & (df['b'] == 0), 1, 0)
for null check.
你快到了,而是np.where(df['a'].isnull() & (df['b'] == 0), 1, 0)
用于空检查。
Alternatively,
或者,
In [258]: df['c'] = (df['a'].isnull() & (df['b'] == 0)).astype(int)
In [259]: df
Out[259]:
a b c
0 NaN 0.0 1
1 2.0 0.0 0
2 NaN 1.0 0
回答by Akshay Kandul
You cant check series value is NaN using np.nan instead use series.isnull()
您不能使用 np.nan 检查系列值是否为 NaN 而是使用 series.isnull()
Below code gives desired output:
下面的代码给出了所需的输出:
df['c'] = np.where(df['a'].isnull() & np.array(df['b'] == 0),1,0)
回答by vishnu_tushaar
For more control on conditions use np.select. Very similar to case when, can be used to scale up multiple outputs.
要对条件进行更多控制,请使用 np.select。与 case when 非常相似,可用于放大多个输出。
df['c'] = np.select(
[
(df['a'].isnull() & (df['b'] == 0))
],
[
1
],
default=0 )
回答by Nic Scozzaro
My personal preference is to use pandas apply
function with an if statement:
我个人的偏好是使用apply
带有 if 语句的pandas函数:
df['c'] = df.apply(lambda x: (1 if np.isnan(x[0]) and x[1] == 0 else 0), axis=1)