pandas 如何选择包含大于阈值的值的所有行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42613467/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:07:12  来源:igfitidea点击:

How to select all rows which contain values greater than a threshold?

pythonpandasdataframe

提问by displayname

The request is simple: I want to select all rows which contain a value greater than a threshold.

请求很简单:我想选择包含大于阈值的值的所有行。

If I do it like this:

如果我这样做:

df[(df > threshold)]

I get these rows, but values below that threshold are simply NaN. How do I avoid selecting these rows?

我得到这些行,但低于该阈值的值只是NaN. 如何避免选择这些行?

回答by miradulo

There is absolutely no need for the double transposition - you can simply call anyalong the column index (supplying 1 or 'columns') on your Boolean matrix.

绝对不需要双重转置 - 您可以简单地any沿着'columns'布尔矩阵上的列索引(提供 1 或)调用。

df[(df > threshold).any(1)]

Example

例子

>>> df = pd.DataFrame(np.random.randint(0, 100, 50).reshape(5, 10))

>>> df

    0   1   2   3   4   5   6   7   8   9
0  45  53  89  63  62  96  29  56  42   6
1   0  74  41  97  45  46  38  39   0  49
2  37   2  55  68  16  14  93  14  71  84
3  67  45  79  75  27  94  46  43   7  40
4  61  65  73  60  67  83  32  77  33  96

>>> df[(df > 95).any(1)]

    0   1   2   3   4   5   6   7   8   9
0  45  53  89  63  62  96  29  56  42   6
1   0  74  41  97  45  46  38  39   0  49
4  61  65  73  60  67  83  32  77  33  96

Transposing as your self-answer does is just an unnecessary performance hit.

像你的自我回答那样移调只是不必要的性能损失。

df = pd.DataFrame(np.random.randint(0, 100, 10**8).reshape(10**4, 10**4))

# standard way
%timeit df[(df > 95).any(1)]
1 loop, best of 3: 8.48 s per loop

# transposing
%timeit df[df.T[(df.T > 95)].any()]
1 loop, best of 3: 13 s per loop

回答by displayname

This is actually very simple:

这其实很简单:

df[df.T[(df.T > 0.33)].any()]