对 Pandas 数据框列使用条件 if/else 逻辑
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52457656/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using conditional if/else logic with pandas dataframe columns
提问by Aaraeus
My dataframe called pw2
looks something like this, where I have two columns, pw1 and pw2, which are probability of wins. I'd like to perform some conditional logic to create another column called WINNER
based off pw1
and pw2
.
我的数据pw2
框看起来像这样,我有两列,pw1 和 pw2,它们是获胜的概率。我想执行一些条件逻辑来创建另一个名为WINNER
based off pw1
and 的列pw2
。
+-------------------------+-------------+-----------+-------------+
| Name1 | pw1 | Name2 | pw2 |
+-------------------------+-------------+-----------+-------------+
| Seaking | 0.517184213 | Lickitung | 0.189236181 |
| Ferrothorn | 0.172510623 | Quagsire | 0.260884258 |
| Thundurus Therian Forme | 0.772536272 | Hitmonlee | 0.694069408 |
| Flaaffy | 0.28681284 | NaN | NaN |
+-------------------------+-------------+-----------+-------------+
I want to do this conditionally in a function but I'm having some trouble.
我想在一个函数中有条件地做到这一点,但我遇到了一些麻烦。
- if
pw1
>pw2
, populate withName1
- if
pw2
>pw1
, populate withName2
- if
pw1
is populated butpw2
isn't, populate withName1
- if
pw2
is populated butpw1
isn't, populate withName2
- 如果
pw1
>pw2
,填充Name1
- 如果
pw2
>pw1
,填充Name2
- 如果
pw1
已填充但未pw2
填充,则填充Name1
- 如果
pw2
已填充但未pw1
填充,则填充Name2
But my function isn't working - for some reason checking if a value is null isn't working.
但是我的函数不起作用 - 出于某种原因检查值是否为空不起作用。
def final_winner(df):
# If PW1 is missing and PW2 is populated, Pokemon 1 wins
if df['pw1'] = None and df['pw2'] != None:
return df['Number1']
# If it's the same thing but the other way around, Pokemon 2 wins
elif df['pw2'] = None and df['pw1'] != None:
return df['Number2']
# If pw2 is greater than pw1, then Pokemon 2 wins
elif df['pw2'] > df['pw1']:
return df['Number2']
else
return df['Number1']
pw2['Winner'] = pw2.apply(final_winner, axis=1)
回答by rafaelc
Do not use apply
, which is very slow. Use np.where
不要使用apply
,这很慢。用np.where
pw2 = df.pw2.fillna(-np.inf)
df['winner'] = np.where(df.pw1 > pw2, df.Name1, df.Name2)
Once NaN
s always lose, can just fillna()
it with -np.inf
to yield same logic.
一旦NaN
s总是输,可以用fillna()
它-np.inf
来产生同样的逻辑。
Looking at your code, we can point out several problems. First, you are comparing df['pw1'] = None
, which is invalid python syntax for comparison. You usually want to compare things using ==
operator. However, for None
, it is recommended to use is
, such as if variable is None: (...)
. However again, you are in a pandas/numpy
environment, where there actually several values for null values (None
, NaN
, NaT
, etc).
查看您的代码,我们可以指出几个问题。首先,您正在比较df['pw1'] = None
,这是用于比较的无效 python 语法。您通常希望使用==
运算符比较事物。但是,对于None
,建议使用is
,例如if variable is None: (...)
。然而再次,你是在一个pandas/numpy
环境中,为空值,其中居然有几个值(None
,NaN
,NaT
等)。
So, it is preferable to check for nullability using pd.isnull()
or df.isnull()
.
因此,最好使用pd.isnull()
或检查可空性df.isnull()
。
Just to illustrate, this is how your code should look like:
只是为了说明,您的代码应该如下所示:
def final_winner(df):
if pd.isnull(df['pw1']) and not pd.isnull(df['pw2']):
return df['Name1']
elif pd.isnull(df['pw2']) and not pd.isnull(df['pw1']):
return df['Name1']
elif df['pw2'] > df['pw1']:
return df['Name2']
else:
return df['Name1']
df['winner'] = df.apply(final_winner, axis=1)
But again, definitely use np.where
.
但同样,绝对使用np.where
.