Python 使用多个布尔列过滤熊猫数据框

Question

提问by Maya Harary

I am trying to filter a df using several Boolean variables that are a part of the df, but have been unable to do so.

我正在尝试使用作为 df 一部分的几个布尔变量来过滤 df，但一直无法这样做。

Sample data:

样本数据：

A | B | C | D
John Doe | 45 | True | False
Jane Smith | 32 | False | False
Alan Holmes | 55 | False | True
Eric Lamar | 29 | True | True

The dtype for columns C and D is Boolean. I want to create a new df (df1) with only the rows where either C or D is True. It should look like this:

C 列和 D 列的 dtype 是布尔值。我想创建一个新的 df (df1)，其中只有 C 或 D 为 True 的行。它应该是这样的：

A | B | C | D
John Doe | 45 | True | False
Alan Holmes | 55 | False | True
Eric Lamar | 29 | True | True

I've tried something like this, which faces issues because it cant handle the Boolean type:

我试过这样的事情，它面临问题，因为它无法处理布尔类型：

df1 = df[(df['C']=='True') or (df['D']=='True')]

Any ideas?

有任何想法吗？

Answer 1

回答by MaxU

In [82]: d
Out[82]:
             A   B      C      D
0     John Doe  45   True  False
1   Jane Smith  32  False  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True

Solution 1:

解决方案1：

In [83]: d.loc[d.C | d.D]
Out[83]:
             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True

Solution 2:

解决方案2：

In [94]: d[d[['C','D']].any(1)]
Out[94]:
             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True

Solution 3:

解决方案3：

In [95]: d.query("C or D")
Out[95]:
             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True

PS If you change your solution to:

PS如果您将解决方案更改为：

df[(df['C']==True) | (df['D']==True)]

it'll work too

它也会起作用

Pandas docs - boolean indexing

Pandas 文档 - 布尔索引

why we should NOT use "PEP complaint" df["col_name"] is Trueinstead of df["col_name"] == True?

为什么我们不应该使用“PEP 投诉”df["col_name"] is True来代替df["col_name"] == True？

In [11]: df = pd.DataFrame({"col":[True, True, True]})

In [12]: df
Out[12]:
    col
0  True
1  True
2  True

In [13]: df["col"] is True
Out[13]: False               # <----- oops, that's not exactly what we wanted

Answer 2

回答by cs95

Hooray! More options!

万岁！更多选择！

`np.where`

df[np.where(df.C | df.D, True, False)]

             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True

`pd.Series.where`on `df.index`

`pd.Series.where`在 `df.index`

df.loc[df.index.where(df.C | df.D).dropna()]

               A   B      C      D
0.0     John Doe  45   True  False
2.0  Alan Holmes  55  False   True
3.0   Eric Lamar  29   True   True

`df.select_dtypes`

df[df.select_dtypes([bool]).any(1)]   

             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True

Abusing `np.select`

滥用 `np.select`

df.iloc[np.select([df.C | df.D], [df.index])].drop_duplicates()

             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True

Answer 3

回答by YOBEN_S

Or

或者

d[d.eval('C or D')]

Out[1065]:
             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True

Answer 4

回答by rra

So, the easiest way to do this:

所以，最简单的方法是：

students = [ ('Hyman1', 'Apples1' , 341) ,
             ('Riti1', 'Mangos1'  , 311) ,
             ('Aadi1', 'Grapes1' , 301) ,
             ('Sonia1', 'Apples1', 321) ,
             ('Lucy1', 'Mangos1'  , 331) ,
             ('Mike1', 'Apples1' , 351),
              ('Mik', 'Apples1' , np.nan)
              ]
#Create a DataFrame object
df = pd.DataFrame(students, columns = ['Name1' , 'Product1', 'Sale1']) 
print(df)


    Name1 Product1  Sale1
0   Hyman1  Apples1    341
1   Riti1  Mangos1    311
2   Aadi1  Grapes1    301
3  Sonia1  Apples1    321
4   Lucy1  Mangos1    331
5   Mike1  Apples1    351
6     Mik  Apples1    NaN

# Select rows in above DataFrame for which ‘Product' column contains the value ‘Apples',
subset = df[df['Product1'] == 'Apples1']
print(subset)

 Name1 Product1  Sale1
0   Hyman1  Apples1    341
3  Sonia1  Apples1    321
5   Mike1  Apples1    351
6     Mik  Apples1    NA

# Select rows in above DataFrame for which ‘Product' column contains the value ‘Apples', AND notnull value in Sale

subsetx= df[(df['Product1'] == "Apples1")  & (df['Sale1'].notnull())]
print(subsetx)
    Name1   Product1    Sale1
0   Hyman1   Apples1      341
3   Sonia1  Apples1      321
5   Mike1   Apples1      351

# Select rows in above DataFrame for which ‘Product' column contains the value ‘Apples', AND Sale = 351

subsetx= df[(df['Product1'] == "Apples1")  & (df['Sale1'] == 351)]
print(subsetx)

   Name1 Product1  Sale1
5  Mike1  Apples1    351

# Another example
subsetData = df[df['Product1'].isin(['Mangos1', 'Grapes1']) ]
print(subsetData)

Name1 Product1  Sale1
1  Riti1  Mangos1    311
2  Aadi1  Grapes1    301
4  Lucy1  Mangos1    331

Here is the source of this code: https://thispointer.com/python-pandas-select-rows-in-dataframe-by-conditions-on-multiple-columns/
I added minor changes to it.

这是此代码的来源：https: //thispointer.com/python-pandas-select-rows-in-dataframe-by-conditions-on-multiple-columns/
我对其进行了细微的更改。

Answer 5

回答by samuelv

you could try this easily:

你可以很容易地尝试这个：

df1 = df[(df['C']=='True') | (df['D']=='True')]

Note:

笔记：

The orlogical operator needs to be replaced by the bitwise |operator.
Ensure that ()are used to enclose each of the operands.

该or逻辑运算符需要由逐位替换|操作。
确保()用于包含每个操作数。

Python 使用多个布尔列过滤熊猫数据框

提问by Maya Harary

回答by MaxU

回答by cs95

`np.where`

`np.where`

`pd.Series.where`on `df.index`

`pd.Series.where`在 `df.index`

`df.select_dtypes`

`df.select_dtypes`

Abusing `np.select`

滥用 `np.select`

回答by YOBEN_S

回答by rra

So, the easiest way to do this:

所以，最简单的方法是：

回答by samuelv

相关推荐

最近更新

标签

Python 使用多个布尔列过滤熊猫数据框

提问by Maya Harary

回答by MaxU

回答by cs95

np.where

np.where

pd.Series.whereon df.index

pd.Series.where在 df.index

df.select_dtypes

df.select_dtypes

Abusing np.select

滥用 np.select

回答by YOBEN_S

回答by rra

So, the easiest way to do this:

所以，最简单的方法是：

回答by samuelv

相关推荐

Python PyCharm 内存不足

Python cx_Oracle.DatabaseError: DPI-1047: 无法加载 64 位 Oracle 客户端库：“dlopen(libclntsh.dylib, 1): image not found”

Python 使用 seaborn，如何在散点图上画一条我选择的线？

Python 如何显示存储在熊猫数据框中的图像？

相关推荐

最近更新

标签

`np.where`

`np.where`

`pd.Series.where`on `df.index`

`pd.Series.where`在 `df.index`

`df.select_dtypes`

`df.select_dtypes`

Abusing `np.select`

滥用 `np.select`