Python 使用 np.where 基于多列的熊猫多个条件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36603018/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:05:19  来源:igfitidea点击:

pandas multiple conditions based on multiple columns using np.where

pythonnumpypandasconditional-statements

提问by Robert

I am trying to color points of an pandas dataframe dependend on TWO conditions. Example:

我正在尝试根据两个条件对熊猫数据框的点进行着色。例子:

If value of col1 > a (float) AND value of col2- value of col3 < b (float), then value of col 4 = string, else: other string.

如果 col1 的值 > a (float) AND col2 的值- col3 < b (float) 的值,则 col 4 的值 = 字符串,否则:其他字符串。

I have tried so many different ways now and everything I found online was only depending on one condition.

我现在尝试了很多不同的方法,我在网上找到的一切都只取决于一种条件。

My example code always raises the Error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我的示例代码总是引发错误:系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

Here's the code. Tried several variations without success.

这是代码。尝试了几种变体,但没有成功。

df = pd.DataFrame()

df['A'] = range(10)
df['B'] = range(11,21,1)
df['C'] = range(20,10,-1)

borderE = 3.
ex = 0.

#print df

df['color'] = np.where(all([df.A < borderE, df.B - df.C < ex]), 'r', 'b')

Btw: I understand, what it says but not how to handle it... Thanks in advance!

顺便说一句:我明白,它说的是什么,但不知道如何处理它......提前致谢!

回答by Alexander

Selection criteria uses Boolean indexing:

选择标准使用布尔索引

df['color'] = np.where(((df.A < borderE) & ((df.B - df.C) < ex)), 'r', 'b')

>>> df
   A   B   C color
0  0  11  20     r
1  1  12  19     r
2  2  13  18     r
3  3  14  17     b
4  4  15  16     b
5  5  16  15     b
6  6  17  14     b
7  7  18  13     b
8  8  19  12     b
9  9  20  11     b

回答by Sam

wrap the IF in a function and apply it:

将 IF 包装在一个函数中并应用它:

def color(row):
    borderE = 3.
    ex = 0.
    if (row.A > borderE) and( row.B - row.C < ex) :
        return "somestring"
    else:
        return "otherstring"

df.loc[:, 'color'] = df.apply(color, axis = 1)

Yields:

产量:

  A   B   C        color
0  0  11  20  otherstring
1  1  12  19  otherstring
2  2  13  18  otherstring
3  3  14  17  otherstring
4  4  15  16   somestring
5  5  16  15  otherstring
6  6  17  14  otherstring
7  7  18  13  otherstring
8  8  19  12  otherstring
9  9  20  11  otherstring