Python Pandas 使用布尔值选择 DataFrame 列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29281815/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:18:55  来源:igfitidea点击:

Pandas Select DataFrame columns using boolean

pythonpandas

提问by dartdog

I want to use a boolean to select the columns with more than 4000 entries from a dataframe combwhich has over 1,000 columns. This expression gives me a Boolean (True/False) result:

我想使用布尔值从comb超过 1,000 列的数据框中选择包含超过 4000 个条目的列。这个表达式给了我一个布尔(真/假)结果:

criteria = comb.ix[:,'c_0327':].count()>4000

I want to use it to select only the Truecolumns to a new Dataframe.
The following just gives me "Unalignable boolean Series key provided":

我想用它来只选择True新数据框的列。
以下只是给我“提供了不可对齐的布尔系列键”:

comb.loc[criteria,]

I also tried:

我也试过:

comb.ix[:, comb.ix[:,'c_0327':].count()>4000] 

Similar to this question answer dataframe boolean selection along columns instead of rowbut that gives me the same error: "Unalignable boolean Series key provided"

类似于这个问题答案数据框布尔选择沿列而不是行,但这给了我同样的错误:“提供了无法对齐的布尔系列键”

comb.ix[:,'c_0327':].count()>4000

yields:

产量:

c_0327    False
c_0328    False
c_0329    False
c_0330    False
c_0331    False
c_0332    False
c_0333    False
c_0334    False
c_0335    False
c_0336    False
c_0337     True
c_0338    False
.....

采纳答案by EdChum

What is returned is a Series with the column names as the index and the boolean values as the row values.

返回的是一个以列名作为索引和布尔值作为行值的系列。

I think actually you want:

我认为实际上你想要:

this should now work:

这现在应该工作:

comb[criteria.index[criteria]]

Basically this uses the index values from criteria and the boolean values to mask them, this will return an array of column names, we can use this to select the columns of interest from the orig df.

基本上这使用来自标准的索引值和布尔值来屏蔽它们,这将返回一个列名数组,我们可以使用它从原始 df 中选择感兴趣的列。

回答by Yohan Obadia

You can also use:

您还可以使用:

# To filter columns (assuming criteria length is equal to the number of columns of comb)
comb.ix[:, criteria]
comb.iloc[:, criteria.values]

# To filter rows (assuming criteria length is equal to the number of rows of comb)
comb[criteria]

回答by Krishna

I'm using this, it's cleaner

我在用这个,比较干净

comb.values[:,criteria]

credit: https://stackoverflow.com/a/43291257/815677

信用:https: //stackoverflow.com/a/43291257/815677

回答by johnDanger

In pandas 0.25:

在熊猫 0.25 中:

comb.loc[:, criteria]

Returns a Dataframe with columns selected by the Boolean list or Series.

返回一个数据框,其中的列由布尔列表或系列选择。

For any one trying to use multiple criteria,

对于任何尝试使用多个标准的人,

comb.loc[:, criteria1 & criteria2]

Note: Using andhere in place of &DOES NOTwork. This is due to andattempting to determine the Boolean value of the entire array while &operates element wise. This is discussed in Logical operators for boolean indexing in Pandas.

注意:使用and代替这里&没有工作。这是由于在按元素操作时and试图确定整个数组的布尔值&。这在 Pandas 中布尔索引的逻辑运算符中讨论。

回答by Seth Johnson

Another solution is to transpose combto make its columns act as its index, then transpose on the resulting subset:

另一种解决方案是转置comb以使其列充当其索引,然后在结果子集上进行转置:

comb.T[criteria].T

Again, not particularly elegant, but at least shorter/less repetitive than the leading solution.

同样,不是特别优雅,但至少比领先的解决方案更短/更少重复。