Pandas Python，根据行条件选择列

Question

提问by hans glick

I have a dataframe:

我有一个数据框：

import pandas as pd
df = pd.DataFrame(np.random.randn(2, 4))
print(df)
          0         1         2         3
0  1.489198  1.329603  1.590124  1.123505
1  0.024017  0.581033  2.500397  0.156280

I want to select the columns which for there is at least one row with a value greater than 2. I tried the following, but it did not work as expected.

我想选择至少有一行值大于的列2。我尝试了以下方法，但没有按预期工作。

df[df.columns[df.iloc[(0,1)]>2]]

In this toy example my expected output would be:

在这个玩具示例中，我的预期输出是：

       2
1.590124  
2.500397

Answer 1

回答by EdChum

Use gtand anyto filter the df:

使用gt和any过滤df：

In [287]:
df.ix[:,df.gt(2).any()]

Out[287]:
          2
0  1.590124
1  2.500397

Here we use ixto select all rows, the first :and the next arg is a boolean mask of the columns that meet the condition:

这里我们使用ix选择所有行，第一个:和下一个 arg 是满足条件的列的布尔掩码：

In [288]:
df.gt(2)

Out[288]:
       0      1      2      3
0  False  False  False  False
1  False  False   True  False

In [289]:
df.gt(2).any()

Out[289]:
0    False
1    False
2     True
3    False
dtype: bool

In your example what you did was select the cell value for the first row and second column, you then tried to use this to mask the columns but this just returned the first column hence why it didn't work:

在您的示例中，您所做的是选择第一行和第二列的单元格值，然后您尝试使用它来屏蔽列，但这只是返回了第一列，因此它不起作用：

In [291]:
df.iloc[(0,1)]

Out[291]:
1.3296030000000001

In [293]:
df.columns[df.iloc[(0,1)]>2]

Out[293]:
'0'

Answer 2

回答by jezrael

Use maskcreated with df > 2with anyand then select columns by ix:

使用mask与创建df > 2具有any然后选择列ix：

import pandas as pd
np.random.seed(18)
df = pd.DataFrame(np.random.randn(2, 4))
print(df)
          0         1         2         3
0  0.079428  2.190202 -0.134892  0.160518
1  0.442698  0.623391  1.008903  0.394249

print ((df>2).any())
0    False
1     True
2    False
3    False
dtype: bool

print (df.ix[:, (df>2).any()])
          1
0  2.190202
1  0.623391

EDIT by comment:

通过评论编辑：

You can check your solution per partes:

您可以检查每个部分的解决方案：

It seems it works, but it always select second column (1, python count from 0) column if condition True:

看起来它有效，但它总是选择第二列 ( 1, python count from 0) 列如果条件True：

print (df.iloc[(0,1)])
2.19020235741

print (df.iloc[(0,1)] > 2)
True

print (df.columns[df.iloc[(0,1)]>2])
1

print (df[df.columns[df.iloc[(0,1)]>2]])
0    2.190202
1    0.623391
Name: 1, dtype: float64

And first column (0) column if False, because boolean Trueand Falseare casted to 1and 0:

和第一列 ( 0) 列 if False，因为布尔值TrueandFalse被强制转换为1and 0：

np.random.seed(15)
df = pd.DataFrame(np.random.randn(2, 4))
print (df)
          0         1         2         3
0 -0.312328  0.339285 -0.155909 -0.501790
1  0.235569 -1.763605 -1.095862 -1.087766

print (df.iloc[(0,1)])
0.339284706046

print (df.iloc[(0,1)] > 2)
False

print (df.columns[df.iloc[(0,1)]>2])
0

print (df[df.columns[df.iloc[(0,1)]>2]])
0   -0.312328
1    0.235569
Name: 0, dtype: float64

If change column names:

如果更改列名：

np.random.seed(15)
df = pd.DataFrame(np.random.randn(2, 4))
df.columns = ['a','b','c','d']
print (df)
          a         b         c         d
0 -0.312328  0.339285 -0.155909 -0.501790
1  0.235569 -1.763605 -1.095862 -1.087766

print (df.iloc[(0,1)] > 2)
False

print (df[df.columns[df.iloc[(0,1)]>2]])
0   -0.312328
1    0.235569
Name: a, dtype: float64

Pandas Python，根据行条件选择列

提问by hans glick

回答by EdChum

回答by jezrael

相关推荐

最近更新

标签

Pandas Python，根据行条件选择列

提问by hans glick

回答by EdChum

回答by jezrael

相关推荐

pandas 熊猫数据框列名称：删除特殊字符

pandas Python：为熊猫时间戳添加小时数

将 Pandas 数据框列从十六进制字符串转换为 int

pandas 使用 Panda read_csv 列出超出范围的索引

相关推荐

最近更新

标签