对 Pandas 数据框列使用条件 if/else 逻辑

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/52457656/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:02:39  来源:igfitidea点击:

Using conditional if/else logic with pandas dataframe columns

pythonpandasdataframeif-statement

提问by Aaraeus

My dataframe called pw2looks something like this, where I have two columns, pw1 and pw2, which are probability of wins. I'd like to perform some conditional logic to create another column called WINNERbased off pw1and pw2.

我的数据pw2框看起来像这样,我有两列,pw1 和 pw2,它们是获胜的概率。我想执行一些条件逻辑来创建另一个名为WINNERbased off pw1and 的列pw2

+-------------------------+-------------+-----------+-------------+
|          Name1          |     pw1     |   Name2   |     pw2     |
+-------------------------+-------------+-----------+-------------+
| Seaking                 | 0.517184213 | Lickitung | 0.189236181 |
| Ferrothorn              | 0.172510623 | Quagsire  | 0.260884258 |
| Thundurus Therian Forme | 0.772536272 | Hitmonlee | 0.694069408 |
| Flaaffy                 | 0.28681284  | NaN       | NaN         |
+-------------------------+-------------+-----------+-------------+

I want to do this conditionally in a function but I'm having some trouble.

我想在一个函数中有条件地做到这一点,但我遇到了一些麻烦。

  • if pw1> pw2, populate with Name1
  • if pw2> pw1, populate with Name2
  • if pw1is populated but pw2isn't, populate with Name1
  • if pw2is populated but pw1isn't, populate with Name2
  • 如果pw1> pw2,填充Name1
  • 如果pw2> pw1,填充Name2
  • 如果pw1已填充但未pw2填充,则填充Name1
  • 如果pw2已填充但未pw1填充,则填充Name2

But my function isn't working - for some reason checking if a value is null isn't working.

但是我的函数不起作用 - 出于某种原因检查值是否为空不起作用。

def final_winner(df):
    # If PW1 is missing and PW2 is populated, Pokemon 1 wins
    if df['pw1'] = None and df['pw2'] != None:
        return df['Number1']
    # If it's the same thing but the other way around, Pokemon 2 wins
    elif df['pw2'] = None and df['pw1'] != None:
        return df['Number2']
    # If pw2 is greater than pw1, then Pokemon 2 wins
    elif df['pw2'] > df['pw1']:
        return df['Number2']
    else
        return df['Number1']

pw2['Winner'] = pw2.apply(final_winner, axis=1)

回答by rafaelc

Do not use apply, which is very slow. Use np.where

不要使用apply,这很慢。用np.where

pw2 = df.pw2.fillna(-np.inf)
df['winner'] = np.where(df.pw1 > pw2, df.Name1, df.Name2)

Once NaNs always lose, can just fillna()it with -np.infto yield same logic.

一旦NaNs总是输,可以用fillna()-np.inf来产生同样的逻辑。



Looking at your code, we can point out several problems. First, you are comparing df['pw1'] = None, which is invalid python syntax for comparison. You usually want to compare things using ==operator. However, for None, it is recommended to use is, such as if variable is None: (...). However again, you are in a pandas/numpyenvironment, where there actually several values for null values (None, NaN, NaT, etc).

查看您的代码,我们可以指出几个问题。首先,您正在比较df['pw1'] = None,这是用于比较的无效 python 语法。您通常希望使用==运算符比较事物。但是,对于None,建议使用is,例如if variable is None: (...)。然而再次,你是在一个pandas/numpy环境中,为空值,其中居然有几个值(NoneNaNNaT等)。

So, it is preferable to check for nullability using pd.isnull()or df.isnull().

因此,最好使用pd.isnull()或检查可空性df.isnull()

Just to illustrate, this is how your code should look like:

只是为了说明,您的代码应该如下所示:

def final_winner(df):
    if pd.isnull(df['pw1']) and not pd.isnull(df['pw2']):
        return df['Name1']
    elif pd.isnull(df['pw2']) and not pd.isnull(df['pw1']):
        return df['Name1']
    elif df['pw2'] > df['pw1']:
        return df['Name2']
    else:
        return df['Name1']

df['winner'] = df.apply(final_winner, axis=1)

But again, definitely use np.where.

但同样,绝对使用np.where.