pandas 如何选择包含大于阈值的值的所有行？

Question

提问by displayname

The request is simple: I want to select all rows which contain a value greater than a threshold.

请求很简单：我想选择包含大于阈值的值的所有行。

If I do it like this:

如果我这样做：

df[(df > threshold)]

I get these rows, but values below that threshold are simply NaN. How do I avoid selecting these rows?

我得到这些行，但低于该阈值的值只是NaN. 如何避免选择这些行？

Answer 1

回答by miradulo

There is absolutely no need for the double transposition - you can simply call anyalong the column index (supplying 1 or 'columns') on your Boolean matrix.

绝对不需要双重转置 - 您可以简单地any沿着'columns'布尔矩阵上的列索引（提供 1 或）调用。

df[(df > threshold).any(1)]

Example

例子

>>> df = pd.DataFrame(np.random.randint(0, 100, 50).reshape(5, 10))

>>> df

    0   1   2   3   4   5   6   7   8   9
0  45  53  89  63  62  96  29  56  42   6
1   0  74  41  97  45  46  38  39   0  49
2  37   2  55  68  16  14  93  14  71  84
3  67  45  79  75  27  94  46  43   7  40
4  61  65  73  60  67  83  32  77  33  96

>>> df[(df > 95).any(1)]

    0   1   2   3   4   5   6   7   8   9
0  45  53  89  63  62  96  29  56  42   6
1   0  74  41  97  45  46  38  39   0  49
4  61  65  73  60  67  83  32  77  33  96

Transposing as your self-answer does is just an unnecessary performance hit.

像你的自我回答那样移调只是不必要的性能损失。

df = pd.DataFrame(np.random.randint(0, 100, 10**8).reshape(10**4, 10**4))

# standard way
%timeit df[(df > 95).any(1)]
1 loop, best of 3: 8.48 s per loop

# transposing
%timeit df[df.T[(df.T > 95)].any()]
1 loop, best of 3: 13 s per loop

Answer 2

回答by displayname

This is actually very simple:

这其实很简单：

df[df.T[(df.T > 0.33)].any()]

pandas 如何选择包含大于阈值的值的所有行？

提问by displayname

回答by miradulo

回答by displayname

相关推荐

最近更新

标签

pandas 如何选择包含大于阈值的值的所有行？

提问by displayname

回答by miradulo

回答by displayname

相关推荐

pandas 带有 matplotlib 散射的条件颜色

使用 Pandas 的指数加权移动平均线

pandas 将熊猫数据帧转换为 utf8

Pandas：TypeError：sort_values() 缺少 1 个必需的位置参数：'by'

相关推荐

最近更新

标签