Python Pandas 使用布尔值选择 DataFrame 列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29281815/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Select DataFrame columns using boolean
提问by dartdog
I want to use a boolean to select the columns with more than 4000 entries from a dataframe comb
which has over 1,000 columns. This expression gives me a Boolean (True/False) result:
我想使用布尔值从comb
超过 1,000 列的数据框中选择包含超过 4000 个条目的列。这个表达式给了我一个布尔(真/假)结果:
criteria = comb.ix[:,'c_0327':].count()>4000
I want to use it to select only the True
columns to a new Dataframe.
The following just gives me "Unalignable boolean Series key provided":
我想用它来只选择True
新数据框的列。
以下只是给我“提供了不可对齐的布尔系列键”:
comb.loc[criteria,]
I also tried:
我也试过:
comb.ix[:, comb.ix[:,'c_0327':].count()>4000]
Similar to this question answer dataframe boolean selection along columns instead of rowbut that gives me the same error: "Unalignable boolean Series key provided"
类似于这个问题答案数据框布尔选择沿列而不是行,但这给了我同样的错误:“提供了无法对齐的布尔系列键”
comb.ix[:,'c_0327':].count()>4000
yields:
产量:
c_0327 False
c_0328 False
c_0329 False
c_0330 False
c_0331 False
c_0332 False
c_0333 False
c_0334 False
c_0335 False
c_0336 False
c_0337 True
c_0338 False
.....
采纳答案by EdChum
What is returned is a Series with the column names as the index and the boolean values as the row values.
返回的是一个以列名作为索引和布尔值作为行值的系列。
I think actually you want:
我认为实际上你想要:
this should now work:
这现在应该工作:
comb[criteria.index[criteria]]
Basically this uses the index values from criteria and the boolean values to mask them, this will return an array of column names, we can use this to select the columns of interest from the orig df.
基本上这使用来自标准的索引值和布尔值来屏蔽它们,这将返回一个列名数组,我们可以使用它从原始 df 中选择感兴趣的列。
回答by Yohan Obadia
You can also use:
您还可以使用:
# To filter columns (assuming criteria length is equal to the number of columns of comb)comb.ix[:, criteria]comb.iloc[:, criteria.values] # To filter rows (assuming criteria length is equal to the number of rows of comb) comb[criteria]
回答by Krishna
I'm using this, it's cleaner
我在用这个,比较干净
comb.values[:,criteria]
回答by johnDanger
In pandas 0.25:
在熊猫 0.25 中:
comb.loc[:, criteria]
Returns a Dataframe with columns selected by the Boolean list or Series.
返回一个数据框,其中的列由布尔列表或系列选择。
For any one trying to use multiple criteria,
对于任何尝试使用多个标准的人,
comb.loc[:, criteria1 & criteria2]
Note:
Using and
here in place of &
DOES NOTwork. This is due to and
attempting to determine the Boolean value of the entire array while &
operates element wise. This is discussed in Logical operators for boolean indexing in Pandas.
注意:使用and
代替这里&
没有工作。这是由于在按元素操作时and
试图确定整个数组的布尔值&
。这在 Pandas 中布尔索引的逻辑运算符中讨论。
回答by Seth Johnson
Another solution is to transpose comb
to make its columns act as its index, then transpose on the resulting subset:
另一种解决方案是转置comb
以使其列充当其索引,然后在结果子集上进行转置:
comb.T[criteria].T
Again, not particularly elegant, but at least shorter/less repetitive than the leading solution.
同样,不是特别优雅,但至少比领先的解决方案更短/更少重复。