Python 使用多个布尔列过滤熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46207530/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filtering pandas dataframe with multiple Boolean columns
提问by Maya Harary
I am trying to filter a df using several Boolean variables that are a part of the df, but have been unable to do so.
我正在尝试使用作为 df 一部分的几个布尔变量来过滤 df,但一直无法这样做。
Sample data:
样本数据:
A | B | C | D
John Doe | 45 | True | False
Jane Smith | 32 | False | False
Alan Holmes | 55 | False | True
Eric Lamar | 29 | True | True
The dtype for columns C and D is Boolean. I want to create a new df (df1) with only the rows where either C or D is True. It should look like this:
C 列和 D 列的 dtype 是布尔值。我想创建一个新的 df (df1),其中只有 C 或 D 为 True 的行。它应该是这样的:
A | B | C | D
John Doe | 45 | True | False
Alan Holmes | 55 | False | True
Eric Lamar | 29 | True | True
I've tried something like this, which faces issues because it cant handle the Boolean type:
我试过这样的事情,它面临问题,因为它无法处理布尔类型:
df1 = df[(df['C']=='True') or (df['D']=='True')]
Any ideas?
有任何想法吗?
回答by MaxU
In [82]: d
Out[82]:
A B C D
0 John Doe 45 True False
1 Jane Smith 32 False False
2 Alan Holmes 55 False True
3 Eric Lamar 29 True True
Solution 1:
解决方案1:
In [83]: d.loc[d.C | d.D]
Out[83]:
A B C D
0 John Doe 45 True False
2 Alan Holmes 55 False True
3 Eric Lamar 29 True True
Solution 2:
解决方案2:
In [94]: d[d[['C','D']].any(1)]
Out[94]:
A B C D
0 John Doe 45 True False
2 Alan Holmes 55 False True
3 Eric Lamar 29 True True
Solution 3:
解决方案3:
In [95]: d.query("C or D")
Out[95]:
A B C D
0 John Doe 45 True False
2 Alan Holmes 55 False True
3 Eric Lamar 29 True True
PS If you change your solution to:
PS如果您将解决方案更改为:
df[(df['C']==True) | (df['D']==True)]
it'll work too
它也会起作用
Pandas docs - boolean indexing
why we should NOT use "PEP complaint"
df["col_name"] is True
instead ofdf["col_name"] == True
?
为什么我们不应该使用“PEP 投诉”
df["col_name"] is True
来代替df["col_name"] == True
?
In [11]: df = pd.DataFrame({"col":[True, True, True]})
In [12]: df
Out[12]:
col
0 True
1 True
2 True
In [13]: df["col"] is True
Out[13]: False # <----- oops, that's not exactly what we wanted
回答by cs95
Hooray! More options!
万岁!更多选择!
np.where
np.where
df[np.where(df.C | df.D, True, False)]
A B C D
0 John Doe 45 True False
2 Alan Holmes 55 False True
3 Eric Lamar 29 True True
pd.Series.where
on df.index
pd.Series.where
在 df.index
df.loc[df.index.where(df.C | df.D).dropna()]
A B C D
0.0 John Doe 45 True False
2.0 Alan Holmes 55 False True
3.0 Eric Lamar 29 True True
df.select_dtypes
df.select_dtypes
df[df.select_dtypes([bool]).any(1)]
A B C D
0 John Doe 45 True False
2 Alan Holmes 55 False True
3 Eric Lamar 29 True True
Abusing np.select
滥用 np.select
df.iloc[np.select([df.C | df.D], [df.index])].drop_duplicates()
A B C D
0 John Doe 45 True False
2 Alan Holmes 55 False True
3 Eric Lamar 29 True True
回答by YOBEN_S
Or
或者
d[d.eval('C or D')]
Out[1065]:
A B C D
0 John Doe 45 True False
2 Alan Holmes 55 False True
3 Eric Lamar 29 True True
回答by rra
So, the easiest way to do this:
所以,最简单的方法是:
students = [ ('Hyman1', 'Apples1' , 341) ,
('Riti1', 'Mangos1' , 311) ,
('Aadi1', 'Grapes1' , 301) ,
('Sonia1', 'Apples1', 321) ,
('Lucy1', 'Mangos1' , 331) ,
('Mike1', 'Apples1' , 351),
('Mik', 'Apples1' , np.nan)
]
#Create a DataFrame object
df = pd.DataFrame(students, columns = ['Name1' , 'Product1', 'Sale1'])
print(df)
Name1 Product1 Sale1
0 Hyman1 Apples1 341
1 Riti1 Mangos1 311
2 Aadi1 Grapes1 301
3 Sonia1 Apples1 321
4 Lucy1 Mangos1 331
5 Mike1 Apples1 351
6 Mik Apples1 NaN
# Select rows in above DataFrame for which ‘Product' column contains the value ‘Apples',
subset = df[df['Product1'] == 'Apples1']
print(subset)
Name1 Product1 Sale1
0 Hyman1 Apples1 341
3 Sonia1 Apples1 321
5 Mike1 Apples1 351
6 Mik Apples1 NA
# Select rows in above DataFrame for which ‘Product' column contains the value ‘Apples', AND notnull value in Sale
subsetx= df[(df['Product1'] == "Apples1") & (df['Sale1'].notnull())]
print(subsetx)
Name1 Product1 Sale1
0 Hyman1 Apples1 341
3 Sonia1 Apples1 321
5 Mike1 Apples1 351
# Select rows in above DataFrame for which ‘Product' column contains the value ‘Apples', AND Sale = 351
subsetx= df[(df['Product1'] == "Apples1") & (df['Sale1'] == 351)]
print(subsetx)
Name1 Product1 Sale1
5 Mike1 Apples1 351
# Another example
subsetData = df[df['Product1'].isin(['Mangos1', 'Grapes1']) ]
print(subsetData)
Name1 Product1 Sale1
1 Riti1 Mangos1 311
2 Aadi1 Grapes1 301
4 Lucy1 Mangos1 331
Here is the source of this code: https://thispointer.com/python-pandas-select-rows-in-dataframe-by-conditions-on-multiple-columns/
I added minor changes to it.
这是此代码的来源:https: //thispointer.com/python-pandas-select-rows-in-dataframe-by-conditions-on-multiple-columns/
我对其进行了细微的更改。
回答by samuelv
you could try this easily:
你可以很容易地尝试这个:
df1 = df[(df['C']=='True') | (df['D']=='True')]
Note:
笔记:
- The
or
logical operator needs to be replaced by the bitwise|
operator. - Ensure that
()
are used to enclose each of the operands.
- 该
or
逻辑运算符需要由逐位替换|
操作。 - 确保
()
用于包含每个操作数。