Python Numpy 第一次出现大于现有值的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16243955/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:06:10  来源:igfitidea点击:

Numpy first occurrence of value greater than existing value

pythonnumpy

提问by user308827

I have a 1D array in numpy and I want to find the position of the index where a value exceeds the value in numpy array.

我在 numpy 中有一个一维数组,我想找到一个值超过 numpy 数组中的值的索引位置。

E.g.

例如

aa = range(-10,10)

Find position in aawhere, the value 5gets exceeded.

查找超出aa该值的位置5

采纳答案by askewchan

This is a little faster (and looks nicer)

这有点快(而且看起来更好)

np.argmax(aa>5)

Since argmaxwill stop at the first True("In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.") and doesn't save another list.

因为argmax将在第一个停止True(“如果最大值出现多次,则返回与第一次出现相对应的索引。”)并且不保存另一个列表。

In [2]: N = 10000

In [3]: aa = np.arange(-N,N)

In [4]: timeit np.argmax(aa>N/2)
100000 loops, best of 3: 52.3 us per loop

In [5]: timeit np.where(aa>N/2)[0][0]
10000 loops, best of 3: 141 us per loop

In [6]: timeit np.nonzero(aa>N/2)[0][0]
10000 loops, best of 3: 142 us per loop

回答by Moj

In [34]: a=np.arange(-10,10)

In [35]: a
Out[35]:
array([-10,  -9,  -8,  -7,  -6,  -5,  -4,  -3,  -2,  -1,   0,   1,   2,
         3,   4,   5,   6,   7,   8,   9])

In [36]: np.where(a>5)
Out[36]: (array([16, 17, 18, 19]),)

In [37]: np.where(a>5)[0][0]
Out[37]: 16

回答by MichaelKaisers

given the sorted content of your array, there is an even faster method: searchsorted.

给定数组的排序内容,还有一种更快的方法:searchsorted

import time
N = 10000
aa = np.arange(-N,N)
%timeit np.searchsorted(aa, N/2)+1
%timeit np.argmax(aa>N/2)
%timeit np.where(aa>N/2)[0][0]
%timeit np.nonzero(aa>N/2)[0][0]

# Output
100000 loops, best of 3: 5.97 μs per loop
10000 loops, best of 3: 46.3 μs per loop
10000 loops, best of 3: 154 μs per loop
10000 loops, best of 3: 154 μs per loop

回答by Nico Schl?mer

I was also interested in this and I've compared all the suggested answers with perfplot. (Disclaimer: I'm the author of perfplot.)

我也对此很感兴趣,并且将所有建议的答案与perfplot 进行了比较。(免责声明:我是 perfplot 的作者。)

If you know that the array you're looking through is already sorted, then

如果您知道您正在查看的数组已经排序,那么

numpy.searchsorted(a, alpha)

is for you. It's a constant-time operation, i.e., the speed does notdepend on the size of the array. You can't get faster than that.

是给你的。这是一个固定时间操作,即,速度也不能依赖于数组的大小。你不能比这更快。

If you don't know anything about your array, you're not going wrong with

如果您对阵列一无所知,那么您就不会出错

numpy.argmax(a > alpha)

Already sorted:

已经排序:

enter image description here

在此处输入图片说明

Unsorted:

未分类:

enter image description here

在此处输入图片说明

Code to reproduce the plot:

重现情节的代码:

import numpy
import perfplot


alpha = 0.5

def argmax(data):
    return numpy.argmax(data > alpha)

def where(data):
    return numpy.where(data > alpha)[0][0]

def nonzero(data):
    return numpy.nonzero(data > alpha)[0][0]

def searchsorted(data):
    return numpy.searchsorted(data, alpha)

out = perfplot.show(
    # setup=numpy.random.rand,
    setup=lambda n: numpy.sort(numpy.random.rand(n)),
    kernels=[
        argmax, where,
        nonzero,
        searchsorted
        ],
    n_range=[2**k for k in range(2, 20)],
    logx=True,
    logy=True,
    xlabel='len(array)'
    )

回答by sivic

I would go with

我会和

i = np.min(np.where(V >= x))

where Vis vector (1d array), xis the value and iis the resulting index.

其中V是向量(一维数组),x是值,i是结果索引。

回答by MSeifert

Arrays that have a constant step between elements

元素之间具有恒定步长的数组

In case of a rangeor any other linearly increasing array you can simply calculate the index programmatically, no need to actually iterate over the array at all:

如果是一个range或任何其他线性增加的数组,您可以简单地以编程方式计算索引,根本不需要实际迭代数组:

def first_index_calculate_range_like(val, arr):
    if len(arr) == 0:
        raise ValueError('no value greater than {}'.format(val))
    elif len(arr) == 1:
        if arr[0] > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    first_value = arr[0]
    step = arr[1] - first_value
    # For linearly decreasing arrays or constant arrays we only need to check
    # the first element, because if that does not satisfy the condition
    # no other element will.
    if step <= 0:
        if first_value > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    calculated_position = (val - first_value) / step

    if calculated_position < 0:
        return 0
    elif calculated_position > len(arr) - 1:
        raise ValueError('no value greater than {}'.format(val))

    return int(calculated_position) + 1

One could probably improve that a bit. I have made sure it works correctly for a few sample arrays and values but that doesn't mean there couldn't be mistakes in there, especially considering that it uses floats...

人们可能会稍微改进一下。我已经确保它对一些样本数组和值正常工作,但这并不意味着那里不会有错误,特别是考虑到它使用浮点数......

>>> import numpy as np
>>> first_index_calculate_range_like(5, np.arange(-10, 10))
16
>>> np.arange(-10, 10)[16]  # double check
6

>>> first_index_calculate_range_like(4.8, np.arange(-10, 10))
15

Given that it can calculate the position without any iteration it will be constant time (O(1)) and can probably beat all other mentioned approaches. However it requires a constant step in the array, otherwise it will produce wrong results.

鉴于它可以在没有任何迭代的情况下计算位置,它将是恒定时间 ( O(1)) 并且可能会击败所有其他提到的方法。但是它需要数组中的一个恒定步骤,否则会产生错误的结果。

General solution using numba

使用 numba 的通用解决方案

A more general approach would be using a numba function:

更通用的方法是使用 numba 函数:

@nb.njit
def first_index_numba(val, arr):
    for idx in range(len(arr)):
        if arr[idx] > val:
            return idx
    return -1

That will work for any array but it has to iterate over the array, so in the average case it will be O(n):

这适用于任何数组,但它必须遍历数组,所以在一般情况下,它将是O(n)

>>> first_index_numba(4.8, np.arange(-10, 10))
15
>>> first_index_numba(5, np.arange(-10, 10))
16

Benchmark

基准

Even though Nico Schl?mer already provided some benchmarks I thought it might be useful to include my new solutions and to test for different "values".

尽管 Nico Schl?mer 已经提供了一些基准,但我认为包含我的新解决方案并测试不同的“值”可能会很有用。

The test setup:

测试设置:

import numpy as np
import math
import numba as nb

def first_index_using_argmax(val, arr):
    return np.argmax(arr > val)

def first_index_using_where(val, arr):
    return np.where(arr > val)[0][0]

def first_index_using_nonzero(val, arr):
    return np.nonzero(arr > val)[0][0]

def first_index_using_searchsorted(val, arr):
    return np.searchsorted(arr, val) + 1

def first_index_using_min(val, arr):
    return np.min(np.where(arr > val))

def first_index_calculate_range_like(val, arr):
    if len(arr) == 0:
        raise ValueError('empty array')
    elif len(arr) == 1:
        if arr[0] > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    first_value = arr[0]
    step = arr[1] - first_value
    if step <= 0:
        if first_value > val:
            return 0
        else:
            raise ValueError('no value greater than {}'.format(val))

    calculated_position = (val - first_value) / step

    if calculated_position < 0:
        return 0
    elif calculated_position > len(arr) - 1:
        raise ValueError('no value greater than {}'.format(val))

    return int(calculated_position) + 1

@nb.njit
def first_index_numba(val, arr):
    for idx in range(len(arr)):
        if arr[idx] > val:
            return idx
    return -1

funcs = [
    first_index_using_argmax, 
    first_index_using_min, 
    first_index_using_nonzero,
    first_index_calculate_range_like, 
    first_index_numba, 
    first_index_using_searchsorted, 
    first_index_using_where
]

from simple_benchmark import benchmark, MultiArgument

and the plots were generated using:

并且使用以下方法生成图:

%matplotlib notebook
b.plot()

item is at the beginning

项目在开头

b = benchmark(
    funcs,
    {2**i: MultiArgument([0, np.arange(2**i)]) for i in range(2, 20)},
    argument_name="array size")

enter image description here

在此处输入图片说明

The numba function performs best followed by the calculate-function and the searchsorted function. The other solutions perform much worse.

numba 函数表现最好,其次是计算函数和搜索排序函数。其他解决方案的表现要差得多。

item is at the end

项目在最后

b = benchmark(
    funcs,
    {2**i: MultiArgument([2**i-2, np.arange(2**i)]) for i in range(2, 20)},
    argument_name="array size")

enter image description here

在此处输入图片说明

For small arrays the numba function performs amazingly fast, however for bigger arrays it's outperformed by the calculate-function and the searchsorted function.

对于小数组,numba 函数的执行速度非常快,但是对于较大的数组,它的计算函数和 searchsorted 函数的性能要好得多。

item is at sqrt(len)

项目在 sqrt(len)

b = benchmark(
    funcs,
    {2**i: MultiArgument([np.sqrt(2**i), np.arange(2**i)]) for i in range(2, 20)},
    argument_name="array size")

enter image description here

在此处输入图片说明

This is more interesting. Again numba and the calculate function perform great, however this is actually triggering the worst case of searchsorted which really doesn't work well in this case.

这更有趣。再次 numba 和计算函数表现很好,但是这实际上触发了 searchsorted 的最坏情况,在这种情况下它确实不能很好地工作。

Comparison of the functions when no value satisfies the condition

没有值满足条件时的函数比较

Another interesting point is how these function behave if there is no value whose index should be returned:

另一个有趣的一点是,如果没有应返回其索引的值,这些函数的行为如何:

arr = np.ones(100)
value = 2

for func in funcs:
    print(func.__name__)
    try:
        print('-->', func(value, arr))
    except Exception as e:
        print('-->', e)

With this result:

有了这个结果:

first_index_using_argmax
--> 0
first_index_using_min
--> zero-size array to reduction operation minimum which has no identity
first_index_using_nonzero
--> index 0 is out of bounds for axis 0 with size 0
first_index_calculate_range_like
--> no value greater than 2
first_index_numba
--> -1
first_index_using_searchsorted
--> 101
first_index_using_where
--> index 0 is out of bounds for axis 0 with size 0

Searchsorted, argmax, and numba simply return a wrong value. However searchsortedand numbareturn an index that is not a valid index for the array.

Searchsorted、argmax 和 numba 只会返回错误的值。但是,searchsortednumba返回一个不是数组有效索引的索引。

The functions where, min, nonzeroand calculatethrow an exception. However only the exception for calculateactually says anything helpful.

功能whereminnonzerocalculate抛出一个异常。然而,只有例外calculate实际上说任何有用的东西。

That means one actually has to wrap these calls in an appropriate wrapper function that catches exceptions or invalid return values and handle appropriately, at least if you aren't sure if the value could be in the array.

这意味着人们实际上必须将这些调用包装在一个适当的包装函数中,该函数捕获异常或无效的返回值并进行适当的处​​理,至少在您不确定该值是否在数组中的情况下。



Note: The calculate and searchsortedoptions only work in special conditions. The "calculate" function requires a constant step and the searchsorted requires the array to be sorted. So these could be useful in the right circumstances but aren't generalsolutions for this problem. In case you're dealing with sortedPython lists you might want to take a look at the bisectmodule instead of using Numpys searchsorted.

注意:计算和searchsorted选项仅适用于特殊条件。"calculate" 函数需要一个恒定的步骤,而 searchsorted 需要对数组进行排序。所以这些在正确的情况下可能很有用,但不是这个问题的通用解决方案。如果您正在处理已排序的Python 列表,您可能需要查看bisect模块而不是使用 Numpys searchsorted。

回答by mfeldt

I'd like to propose

我想提议

np.min(np.append(np.where(aa>5)[0],np.inf))

This will return the smallest index where the condition is met, while returning infinity if the condition is never met (and wherereturns an empty array).

这将返回满足条件的最小索引,如果从未满足条件则返回无穷大(并where返回一个空数组)。