pandas Python 在列表或数组中的范围之间查找数字

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46980001/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:42:20  来源:igfitidea点击:

Python find numbers between range in list or array

pythonarrayslistpandasnumpy

提问by ragardner

I have a list with millions of numbers which are always increasing to the end, I need to find and return numbers within a specified range e.g. numbers greater than X but less than Y, the numbers in the list can change and the values I'm searching for change as well

我有一个包含数百万个数字的列表,这些数字总是增加到最后,我需要查找并返回指定范围内的数字,例如大于 X 但小于 Y 的数字,列表中的数字可以更改,而我的值也在寻找变化

I have been using this method, please note this is a basic example the numbers are not uniform or the same as shown below in my program

我一直在使用这种方法,请注意这是一个基本示例,我的程序中的数字不统一或与下图所示相同

l = [i for i in range(2000000)]
nums = []
for element in l:
    if element > 950004:
        break
    if element > 950000:
        nums.append(element)
#[950001, 950002, 950003, 950004]

Although fast, I kind of need it to be a bit faster for what my program is doing, the numbers change a lot so I'm wondering if there's a better way to do this with a pandas series or a numpy array? but so far all I've done is make an example in numpy:

虽然速度很快,但我有点需要它对我的程序正在做的事情快一点,数字变化很大,所以我想知道是否有更好的方法来使用 Pandas 系列或 numpy 数组来做到这一点?但到目前为止我所做的只是在 numpy 中做一个例子:

a = numpy.array(l,dtype=numpy.int64)

Would a pandas series be more functional? Making use of query()? what would be the best way to approach this with an array as opposed to a python list of python objects

Pandas系列会更实用吗?使用查询()?用数组而不是 python 对象的 python 列表来解决这个问题的最佳方法是什么

回答by Calculator

Here is a solution using binary search. You are speaking of millions of numbers. Technically binary search will make the algorithm faster by decreasing the runtime complexity to O(log n) neglecting the final slicing step.

这是使用二进制搜索的解决方案。你说的是数百万个数字。从技术上讲,通过将运行时复杂度降低到 O(log n) 忽略最后的切片步骤,二进制搜索将使算法更快。

import bisect

l = [i for i in range(2000000)]
lower_bound = 950000
upper_bound = 950004

lower_bound_i = bisect.bisect_left(l, lower_bound)
upper_bound_i = bisect.bisect_right(l, upper_bound, lo=lower_bound_i)
nums = l[lower_bound_i:upper_bound_i]

回答by Maor Veitsman

The following are two implementations for binary search (based on code from here) - one which searches for an upper limit and one which searches for a lower limit. Does this work better for you?

以下是二分搜索的两种实现(基于此处的代码) - 一种搜索上限,一种搜索下限。这对你来说更有效吗?

def binary_search_upper(seq, limit):
    min = 0
    max = len(seq) - 1
    while True:
        if max < min:
            return -1
        m = (min + max) / 2
        if m == (len(seq) -1) or (seq[m] <= limit and seq[m+1] > limit):
            return m
        elif seq[m] < limit:
            min = m+1
        else:
            max = m - 1

def binary_search_lower(seq, limit):
    min = 0
    max = len(seq) - 1
    while True:
        if max < min:
            return -1
        m = (min + max) / 2
        if m == 0 or (seq[m] >= limit and seq[m-1] < limit):
            return m
        elif seq[m] < limit:
            min = m+1
        else:
            max = m - 1


l = [i for i in range(2000000)]
print binary_search_upper(l, 950004)
print binary_search_lower(l, 950000)

回答by James

You could use numpy to get a subset of your list using a boolean slice.

您可以使用 numpy 使用布尔切片获取列表的子集。

import numpy as np
a = np.arange(2000000)
nums = a[(950000<a) & (a<=950004)]
nums
# returns
array([950001, 950002, 950003, 950004])