Pandas 使用 bool 过滤 DataFrame 的列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37391539/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:16:24  来源:igfitidea点击:

Pandas filter columns of a DataFrame with bool

pythonpandasdataframe

提问by mohitos

For a DataFrame (df) with multiple columns and rows

对于具有多列和多行的 DataFrame (df)

     A   B  C  D
0    1   4  2  6
1    2   5  7  4
2    3   6  5  6

and another DataFrame (dfBool) containing dtype: bool

和另一个包含 dtype: bool 的 DataFrame (dfBool)

0  True
1  False
2  False
3  True

What is the easiest way to split this DataFrame by columns into two different DataFrames by transposing dfbool so you get the desired output

通过转置 dfbool 将这个 DataFrame 按列拆分为两个不同的 DataFrame 的最简单方法是什么,以便获得所需的输出

     A   D
0    1   6
1    2   4
2    3   6 

     B  C 
0    4  2  
1    5  7  
2    6  5  

I cannot understand, in my limited experience why dfTrue = df[dfBool.transpose() == True]does not work

我无法理解,以我有限的经验为什么dfTrue = df[dfBool.transpose() == True]不起作用

回答by jezrael

I would like to modify EdChum's comment, because if dfBoolis DataFrame, you have to first select column:

我想修改EdChum 的评论,因为如果dfBoolDataFrame,你必须先选择column

import pandas as pd

df = pd.DataFrame({'D': {0: 6, 1: 4, 2: 6},
                    'A': {0: 1, 1: 2, 2: 3},
                    'C': {0: 2, 1: 7, 2: 5},
                    'B': {0: 4, 1: 5, 2: 6}})
print (df)
   A  B  C  D
0  1  4  2  6
1  2  5  7  4
2  3  6  5  6

dfBool = pd.DataFrame({'a':[True, False, False, True]})
print (dfBool)
       a
0   True
1  False
2  False
3   True
#select first column in dfBool
df2 = (dfBool.iloc[:,0])
#or select column a in dfBool
#df2 = (dfBool.a)
print (df2)
0     True
1    False
2    False
3     True
Name: a, dtype: bool

print (df[df.columns[df2]])
   A  D
0  1  6
1  2  4
2  3  6

print (df[df.columns[~df2]])
   B  C
0  4  2
1  5  7
2  6  5

Another very nice solution from ayhan, thank you:

来自ayhan 的另一个非常好的解决方案,谢谢:

print (df.loc[:, dfBool.a.values])
   A  D
0  1  6
1  2  4
2  3  6

print (df.loc[:, ~dfBool.a.values])
   B  C
0  4  2
1  5  7
2  6  5

But if dfBoolis Series, solution works very well:

但如果dfBoolSeries,解决方案效果很好:

dfBool = pd.Series([True, False, False, True])
print (dfBool)

0     True
1    False
2    False
3     True
dtype: bool

print (df[df.columns[dfBool]])
   A  D
0  1  6
1  2  4
2  3  6

print (df[df.columns[~dfBool]])
   B  C
0  4  2
1  5  7
2  6  5

And for Series:

而对于Series

print (df.loc[:, dfBool.values])
   A  D
0  1  6
1  2  4
2  3  6

print (df.loc[:, ~dfBool.values])
   B  C
0  4  2
1  5  7
2  6  5

Timings:

时间

In [277]: %timeit (df[df.columns[dfBool.a]])
1000 loops, best of 3: 769 μs per loop

In [278]: %timeit (df.loc[:, dfBool1.a.values])
The slowest run took 9.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 380 μs per loop

In [279]: %timeit (df.transpose()[dfBool1.a.values].transpose())
The slowest run took 5.04 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 550 μs per loop

Code for timings:

计时代码

import pandas as pd

df = pd.DataFrame({'D': {0: 6, 1: 4, 2: 6},
                    'A': {0: 1, 1: 2, 2: 3},
                    'C': {0: 2, 1: 7, 2: 5},
                    'B': {0: 4, 1: 5, 2: 6}})
print (df)
df = pd.concat([df]*1000, axis=1).reset_index(drop=True)

dfBool = pd.DataFrame({'a': [True, False, False, True]})
dfBool1 = pd.concat([dfBool]*1000).reset_index(drop=True)

Output is little different:

输出略有不同:

print (df[df.columns[dfBool.a]])
   A  A  A  A  A  A  A  A  A  A ...  D  D  D  D  D  D  D  D  D  D
0  1  1  1  1  1  1  1  1  1  1 ...  6  6  6  6  6  6  6  6  6  6
1  2  2  2  2  2  2  2  2  2  2 ...  4  4  4  4  4  4  4  4  4  4
2  3  3  3  3  3  3  3  3  3  3 ...  6  6  6  6  6  6  6  6  6  6

[3 rows x 2000 columns]

print (df.loc[:, dfBool1.a.values])
   A  D  A  D  A  D  A  D  A  D ...  A  D  A  D  A  D  A  D  A  D
0  1  6  1  6  1  6  1  6  1  6 ...  1  6  1  6  1  6  1  6  1  6
1  2  4  2  4  2  4  2  4  2  4 ...  2  4  2  4  2  4  2  4  2  4
2  3  6  3  6  3  6  3  6  3  6 ...  3  6  3  6  3  6  3  6  3  6

[3 rows x 2000 columns]

print (df.transpose()[dfBool1.a.values].transpose())
   A  D  A  D  A  D  A  D  A  D ...  A  D  A  D  A  D  A  D  A  D
0  1  6  1  6  1  6  1  6  1  6 ...  1  6  1  6  1  6  1  6  1  6
1  2  4  2  4  2  4  2  4  2  4 ...  2  4  2  4  2  4  2  4  2  4
2  3  6  3  6  3  6  3  6  3  6 ...  3  6  3  6  3  6  3  6  3  6

[3 rows x 2000 columns]

回答by Greg Lever

Maybe something like the following ?

也许类似于以下内容?

import pandas as pd
totalDF = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [2, 7, 5], 'D': [6, 4, 8]})

dfBool = pd.DataFrame(data=[True, False, False, True])

totalDF.transpose()[dfBool.values].transpose()


   A  D
0  1  6
1  2  4
2  3  8

totalDF.transpose()[~dfBool.values].transpose()

   B  C
0  4  2
1  5  7
2  6  5