pandas 如何选择包含大于阈值的值的所有行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42613467/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to select all rows which contain values greater than a threshold?
提问by displayname
The request is simple: I want to select all rows which contain a value greater than a threshold.
请求很简单:我想选择包含大于阈值的值的所有行。
If I do it like this:
如果我这样做:
df[(df > threshold)]
I get these rows, but values below that threshold are simply NaN
. How do I avoid selecting these rows?
我得到这些行,但低于该阈值的值只是NaN
. 如何避免选择这些行?
回答by miradulo
There is absolutely no need for the double transposition - you can simply call any
along the column index (supplying 1 or 'columns'
) on your Boolean matrix.
绝对不需要双重转置 - 您可以简单地any
沿着'columns'
布尔矩阵上的列索引(提供 1 或)调用。
df[(df > threshold).any(1)]
Example
例子
>>> df = pd.DataFrame(np.random.randint(0, 100, 50).reshape(5, 10))
>>> df
0 1 2 3 4 5 6 7 8 9
0 45 53 89 63 62 96 29 56 42 6
1 0 74 41 97 45 46 38 39 0 49
2 37 2 55 68 16 14 93 14 71 84
3 67 45 79 75 27 94 46 43 7 40
4 61 65 73 60 67 83 32 77 33 96
>>> df[(df > 95).any(1)]
0 1 2 3 4 5 6 7 8 9
0 45 53 89 63 62 96 29 56 42 6
1 0 74 41 97 45 46 38 39 0 49
4 61 65 73 60 67 83 32 77 33 96
Transposing as your self-answer does is just an unnecessary performance hit.
像你的自我回答那样移调只是不必要的性能损失。
df = pd.DataFrame(np.random.randint(0, 100, 10**8).reshape(10**4, 10**4))
# standard way
%timeit df[(df > 95).any(1)]
1 loop, best of 3: 8.48 s per loop
# transposing
%timeit df[df.T[(df.T > 95)].any()]
1 loop, best of 3: 13 s per loop
回答by displayname
This is actually very simple:
这其实很简单:
df[df.T[(df.T > 0.33)].any()]