pandas Python 在列表或数组中的范围之间查找数字

Question

提问by ragardner

I have a list with millions of numbers which are always increasing to the end, I need to find and return numbers within a specified range e.g. numbers greater than X but less than Y, the numbers in the list can change and the values I'm searching for change as well

我有一个包含数百万个数字的列表，这些数字总是增加到最后，我需要查找并返回指定范围内的数字，例如大于 X 但小于 Y 的数字，列表中的数字可以更改，而我的值也在寻找变化

I have been using this method, please note this is a basic example the numbers are not uniform or the same as shown below in my program

我一直在使用这种方法，请注意这是一个基本示例，我的程序中的数字不统一或与下图所示相同

l = [i for i in range(2000000)]
nums = []
for element in l:
    if element > 950004:
        break
    if element > 950000:
        nums.append(element)
#[950001, 950002, 950003, 950004]

Although fast, I kind of need it to be a bit faster for what my program is doing, the numbers change a lot so I'm wondering if there's a better way to do this with a pandas series or a numpy array? but so far all I've done is make an example in numpy:

虽然速度很快，但我有点需要它对我的程序正在做的事情快一点，数字变化很大，所以我想知道是否有更好的方法来使用 Pandas 系列或 numpy 数组来做到这一点？但到目前为止我所做的只是在 numpy 中做一个例子：

a = numpy.array(l,dtype=numpy.int64)

Would a pandas series be more functional? Making use of query()? what would be the best way to approach this with an array as opposed to a python list of python objects

Pandas系列会更实用吗？使用查询（）？用数组而不是 python 对象的 python 列表来解决这个问题的最佳方法是什么

Answer 1

回答by Calculator

Here is a solution using binary search. You are speaking of millions of numbers. Technically binary search will make the algorithm faster by decreasing the runtime complexity to O(log n) neglecting the final slicing step.

这是使用二进制搜索的解决方案。你说的是数百万个数字。从技术上讲，通过将运行时复杂度降低到 O(log n) 忽略最后的切片步骤，二进制搜索将使算法更快。

import bisect

l = [i for i in range(2000000)]
lower_bound = 950000
upper_bound = 950004

lower_bound_i = bisect.bisect_left(l, lower_bound)
upper_bound_i = bisect.bisect_right(l, upper_bound, lo=lower_bound_i)
nums = l[lower_bound_i:upper_bound_i]

Answer 2

回答by Maor Veitsman

The following are two implementations for binary search (based on code from here) - one which searches for an upper limit and one which searches for a lower limit. Does this work better for you?

以下是二分搜索的两种实现（基于此处的代码） - 一种搜索上限，一种搜索下限。这对你来说更有效吗？

def binary_search_upper(seq, limit):
    min = 0
    max = len(seq) - 1
    while True:
        if max < min:
            return -1
        m = (min + max) / 2
        if m == (len(seq) -1) or (seq[m] <= limit and seq[m+1] > limit):
            return m
        elif seq[m] < limit:
            min = m+1
        else:
            max = m - 1

def binary_search_lower(seq, limit):
    min = 0
    max = len(seq) - 1
    while True:
        if max < min:
            return -1
        m = (min + max) / 2
        if m == 0 or (seq[m] >= limit and seq[m-1] < limit):
            return m
        elif seq[m] < limit:
            min = m+1
        else:
            max = m - 1


l = [i for i in range(2000000)]
print binary_search_upper(l, 950004)
print binary_search_lower(l, 950000)

Answer 3

回答by James

You could use numpy to get a subset of your list using a boolean slice.

您可以使用 numpy 使用布尔切片获取列表的子集。

import numpy as np
a = np.arange(2000000)
nums = a[(950000<a) & (a<=950004)]
nums
# returns
array([950001, 950002, 950003, 950004])

pandas Python 在列表或数组中的范围之间查找数字

提问by ragardner

回答by Calculator

回答by Maor Veitsman

回答by James

相关推荐

最近更新

标签

pandas Python 在列表或数组中的范围之间查找数字

提问by ragardner

回答by Calculator

回答by Maor Veitsman

回答by James

相关推荐

如何在 Pandas 中读取大型 json？

pandas 使用 seaborn 绘制系列

返回数据帧中两列的最大值（Pandas）

创建列的 bin 并获取 Pandas 中的计数

相关推荐

最近更新

标签