对 Pandas 数据框列使用条件 if/else 逻辑
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52457656/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using conditional if/else logic with pandas dataframe columns
提问by Aaraeus
My dataframe called pw2looks something like this, where I have two columns, pw1 and pw2, which are probability of wins. I'd like to perform some conditional logic to create another column called WINNERbased off pw1and pw2.
我的数据pw2框看起来像这样,我有两列,pw1 和 pw2,它们是获胜的概率。我想执行一些条件逻辑来创建另一个名为WINNERbased off pw1and 的列pw2。
+-------------------------+-------------+-----------+-------------+
| Name1 | pw1 | Name2 | pw2 |
+-------------------------+-------------+-----------+-------------+
| Seaking | 0.517184213 | Lickitung | 0.189236181 |
| Ferrothorn | 0.172510623 | Quagsire | 0.260884258 |
| Thundurus Therian Forme | 0.772536272 | Hitmonlee | 0.694069408 |
| Flaaffy | 0.28681284 | NaN | NaN |
+-------------------------+-------------+-----------+-------------+
I want to do this conditionally in a function but I'm having some trouble.
我想在一个函数中有条件地做到这一点,但我遇到了一些麻烦。
- if
pw1>pw2, populate withName1 - if
pw2>pw1, populate withName2 - if
pw1is populated butpw2isn't, populate withName1 - if
pw2is populated butpw1isn't, populate withName2
- 如果
pw1>pw2,填充Name1 - 如果
pw2>pw1,填充Name2 - 如果
pw1已填充但未pw2填充,则填充Name1 - 如果
pw2已填充但未pw1填充,则填充Name2
But my function isn't working - for some reason checking if a value is null isn't working.
但是我的函数不起作用 - 出于某种原因检查值是否为空不起作用。
def final_winner(df):
# If PW1 is missing and PW2 is populated, Pokemon 1 wins
if df['pw1'] = None and df['pw2'] != None:
return df['Number1']
# If it's the same thing but the other way around, Pokemon 2 wins
elif df['pw2'] = None and df['pw1'] != None:
return df['Number2']
# If pw2 is greater than pw1, then Pokemon 2 wins
elif df['pw2'] > df['pw1']:
return df['Number2']
else
return df['Number1']
pw2['Winner'] = pw2.apply(final_winner, axis=1)
回答by rafaelc
Do not use apply, which is very slow. Use np.where
不要使用apply,这很慢。用np.where
pw2 = df.pw2.fillna(-np.inf)
df['winner'] = np.where(df.pw1 > pw2, df.Name1, df.Name2)
Once NaNs always lose, can just fillna()it with -np.infto yield same logic.
一旦NaNs总是输,可以用fillna()它-np.inf来产生同样的逻辑。
Looking at your code, we can point out several problems. First, you are comparing df['pw1'] = None, which is invalid python syntax for comparison. You usually want to compare things using ==operator. However, for None, it is recommended to use is, such as if variable is None: (...). However again, you are in a pandas/numpyenvironment, where there actually several values for null values (None, NaN, NaT, etc).
查看您的代码,我们可以指出几个问题。首先,您正在比较df['pw1'] = None,这是用于比较的无效 python 语法。您通常希望使用==运算符比较事物。但是,对于None,建议使用is,例如if variable is None: (...)。然而再次,你是在一个pandas/numpy环境中,为空值,其中居然有几个值(None,NaN,NaT等)。
So, it is preferable to check for nullability using pd.isnull()or df.isnull().
因此,最好使用pd.isnull()或检查可空性df.isnull()。
Just to illustrate, this is how your code should look like:
只是为了说明,您的代码应该如下所示:
def final_winner(df):
if pd.isnull(df['pw1']) and not pd.isnull(df['pw2']):
return df['Name1']
elif pd.isnull(df['pw2']) and not pd.isnull(df['pw1']):
return df['Name1']
elif df['pw2'] > df['pw1']:
return df['Name2']
else:
return df['Name1']
df['winner'] = df.apply(final_winner, axis=1)
But again, definitely use np.where.
但同样,绝对使用np.where.

![pandas ValueError: 您正在尝试合并 datetime64[ns] 和 object 列。如果你想继续,你应该使用 pd.concat](/res/img/loading.gif)