Pandas Python,根据行条件选择列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38117400/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:29:25  来源:igfitidea点击:

Pandas Python, select columns based on rows conditions

pythonpandasdataframeconditional-statements

提问by hans glick

I have a dataframe:

我有一个数据框:

import pandas as pd
df = pd.DataFrame(np.random.randn(2, 4))
print(df)
          0         1         2         3
0  1.489198  1.329603  1.590124  1.123505
1  0.024017  0.581033  2.500397  0.156280

I want to select the columns which for there is at least one row with a value greater than 2. I tried the following, but it did not work as expected.

我想选择至少有一行值大于 的列2。我尝试了以下方法,但没有按预期工作。

df[df.columns[df.iloc[(0,1)]>2]]

In this toy example my expected output would be:

在这个玩具示例中,我的预期输出是:

       2
1.590124  
2.500397 

回答by EdChum

Use gtand anyto filter the df:

使用gtany过滤df:

In [287]:
df.ix[:,df.gt(2).any()]

Out[287]:
          2
0  1.590124
1  2.500397

Here we use ixto select all rows, the first :and the next arg is a boolean mask of the columns that meet the condition:

这里我们使用ix选择所有行,第一个:和下一个 arg 是满足条件的列的布尔掩码:

In [288]:
df.gt(2)

Out[288]:
       0      1      2      3
0  False  False  False  False
1  False  False   True  False

In [289]:
df.gt(2).any()

Out[289]:
0    False
1    False
2     True
3    False
dtype: bool

In your example what you did was select the cell value for the first row and second column, you then tried to use this to mask the columns but this just returned the first column hence why it didn't work:

在您的示例中,您所做的是选择第一行和第二列的单元格值,然后您尝试使用它来屏蔽列,但这只是返回了第一列,因此它不起作用:

In [291]:
df.iloc[(0,1)]

Out[291]:
1.3296030000000001

In [293]:
df.columns[df.iloc[(0,1)]>2]

Out[293]:
'0'

回答by jezrael

Use maskcreated with df > 2with anyand then select columns by ix:

使用mask与创建df > 2具有any然后选择列ix

import pandas as pd
np.random.seed(18)
df = pd.DataFrame(np.random.randn(2, 4))
print(df)
          0         1         2         3
0  0.079428  2.190202 -0.134892  0.160518
1  0.442698  0.623391  1.008903  0.394249

print ((df>2).any())
0    False
1     True
2    False
3    False
dtype: bool

print (df.ix[:, (df>2).any()])
          1
0  2.190202
1  0.623391

EDIT by comment:

通过评论编辑:

You can check your solution per partes:

您可以检查每个部分的解决方案:

It seems it works, but it always select second column (1, python count from 0) column if condition True:

看起来它有效,但它总是选择第二列 ( 1, python count from 0) 列如果条件True

print (df.iloc[(0,1)])
2.19020235741

print (df.iloc[(0,1)] > 2)
True

print (df.columns[df.iloc[(0,1)]>2])
1

print (df[df.columns[df.iloc[(0,1)]>2]])
0    2.190202
1    0.623391
Name: 1, dtype: float64

And first column (0) column if False, because boolean Trueand Falseare casted to 1and 0:

和第一列 ( 0) 列 if False,因为布尔值TrueandFalse被强制转换为1and 0

np.random.seed(15)
df = pd.DataFrame(np.random.randn(2, 4))
print (df)
          0         1         2         3
0 -0.312328  0.339285 -0.155909 -0.501790
1  0.235569 -1.763605 -1.095862 -1.087766

print (df.iloc[(0,1)])
0.339284706046

print (df.iloc[(0,1)] > 2)
False

print (df.columns[df.iloc[(0,1)]>2])
0

print (df[df.columns[df.iloc[(0,1)]>2]])
0   -0.312328
1    0.235569
Name: 0, dtype: float64

If change column names:

如果更改列名:

np.random.seed(15)
df = pd.DataFrame(np.random.randn(2, 4))
df.columns = ['a','b','c','d']
print (df)
          a         b         c         d
0 -0.312328  0.339285 -0.155909 -0.501790
1  0.235569 -1.763605 -1.095862 -1.087766

print (df.iloc[(0,1)] > 2)
False

print (df[df.columns[df.iloc[(0,1)]>2]])
0   -0.312328
1    0.235569
Name: a, dtype: float64