Pandas Python,根据行条件选择列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38117400/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Python, select columns based on rows conditions
提问by hans glick
I have a dataframe:
我有一个数据框:
import pandas as pd
df = pd.DataFrame(np.random.randn(2, 4))
print(df)
0 1 2 3
0 1.489198 1.329603 1.590124 1.123505
1 0.024017 0.581033 2.500397 0.156280
I want to select the columns which for there is at least one row with a value greater than 2
. I tried the following, but it did not work as expected.
我想选择至少有一行值大于 的列2
。我尝试了以下方法,但没有按预期工作。
df[df.columns[df.iloc[(0,1)]>2]]
In this toy example my expected output would be:
在这个玩具示例中,我的预期输出是:
2
1.590124
2.500397
回答by EdChum
Use gt
and any
to filter the df:
使用gt
和any
过滤df:
In [287]:
df.ix[:,df.gt(2).any()]
Out[287]:
2
0 1.590124
1 2.500397
Here we use ix
to select all rows, the first :
and the next arg is a boolean mask of the columns that meet the condition:
这里我们使用ix
选择所有行,第一个:
和下一个 arg 是满足条件的列的布尔掩码:
In [288]:
df.gt(2)
Out[288]:
0 1 2 3
0 False False False False
1 False False True False
In [289]:
df.gt(2).any()
Out[289]:
0 False
1 False
2 True
3 False
dtype: bool
In your example what you did was select the cell value for the first row and second column, you then tried to use this to mask the columns but this just returned the first column hence why it didn't work:
在您的示例中,您所做的是选择第一行和第二列的单元格值,然后您尝试使用它来屏蔽列,但这只是返回了第一列,因此它不起作用:
In [291]:
df.iloc[(0,1)]
Out[291]:
1.3296030000000001
In [293]:
df.columns[df.iloc[(0,1)]>2]
Out[293]:
'0'
回答by jezrael
Use mask
created with df > 2
with any
and then select columns by ix
:
import pandas as pd
np.random.seed(18)
df = pd.DataFrame(np.random.randn(2, 4))
print(df)
0 1 2 3
0 0.079428 2.190202 -0.134892 0.160518
1 0.442698 0.623391 1.008903 0.394249
print ((df>2).any())
0 False
1 True
2 False
3 False
dtype: bool
print (df.ix[:, (df>2).any()])
1
0 2.190202
1 0.623391
EDIT by comment:
通过评论编辑:
You can check your solution per partes:
您可以检查每个部分的解决方案:
It seems it works, but it always select second column (1
, python count from 0
) column if condition True
:
看起来它有效,但它总是选择第二列 ( 1
, python count from 0
) 列如果条件True
:
print (df.iloc[(0,1)])
2.19020235741
print (df.iloc[(0,1)] > 2)
True
print (df.columns[df.iloc[(0,1)]>2])
1
print (df[df.columns[df.iloc[(0,1)]>2]])
0 2.190202
1 0.623391
Name: 1, dtype: float64
And first column (0
) column if False
, because boolean True
and False
are casted to 1
and 0
:
和第一列 ( 0
) 列 if False
,因为布尔值True
andFalse
被强制转换为1
and 0
:
np.random.seed(15)
df = pd.DataFrame(np.random.randn(2, 4))
print (df)
0 1 2 3
0 -0.312328 0.339285 -0.155909 -0.501790
1 0.235569 -1.763605 -1.095862 -1.087766
print (df.iloc[(0,1)])
0.339284706046
print (df.iloc[(0,1)] > 2)
False
print (df.columns[df.iloc[(0,1)]>2])
0
print (df[df.columns[df.iloc[(0,1)]>2]])
0 -0.312328
1 0.235569
Name: 0, dtype: float64
If change column names:
如果更改列名:
np.random.seed(15)
df = pd.DataFrame(np.random.randn(2, 4))
df.columns = ['a','b','c','d']
print (df)
a b c d
0 -0.312328 0.339285 -0.155909 -0.501790
1 0.235569 -1.763605 -1.095862 -1.087766
print (df.iloc[(0,1)] > 2)
False
print (df[df.columns[df.iloc[(0,1)]>2]])
0 -0.312328
1 0.235569
Name: a, dtype: float64