Python 你如何在 Numpy 中找到 IQR?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23228244/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:32:39  来源:igfitidea点击:

How do you find the IQR in Numpy?

pythonnumpyscipy

提问by Nick T

Is there a baked-in Numpy/Scipy function to find the interquartile range? I can do it pretty easily myself, but mean()exists which is basically sum/len...

是否有内置的 Numpy/Scipy 函数来查找四分位距?我自己可以很容易地做到这一点,但mean()存在基本上sum/len......

def IQR(dist):
    return np.percentile(dist, 75) - np.percentile(dist, 25)

采纳答案by Jaime

np.percentiletakes multiple percentile arguments, and you are slightly better off doing:

np.percentile需要多个百分位参数,你最好这样做:

q75, q25 = np.percentile(x, [75 ,25])
iqr = q75 - q25

or

或者

iqr = np.subtract(*np.percentile(x, [75, 25]))

than making two calls to percentile:

而不是两次调用percentile

In [8]: x = np.random.rand(1e6)

In [9]: %timeit q75, q25 = np.percentile(x, [75 ,25]); iqr = q75 - q25
10 loops, best of 3: 24.2 ms per loop

In [10]: %timeit iqr = np.subtract(*np.percentile(x, [75, 25]))
10 loops, best of 3: 24.2 ms per loop

In [11]: %timeit iqr = np.percentile(x, 75) - np.percentile(x, 25)
10 loops, best of 3: 33.7 ms per loop

回答by Mad Physicist

There is now an iqrfunction in scipy.stats. It is available as of scipy 0.18.0. My original intent was to add it to numpy, but it was considered too domain-specific.

现在有一个iqr函数scipy.stats。它从 scipy 0.18.0 开始可用。我的初衷是将它添加到 numpy,但它被认为过于特定于域。

You may be better off just using Jaime's answer, since the scipy code is just an over-complicated version of the same.

您最好只使用 Jaime 的答案,因为 scipy 代码只是它的一个过于复杂的版本。

回答by Ham

Ignore this if Jaime's answerworks for your case. But if not, according to this answer, to find the exactvalues of 1st and 3rd quartiles, you should consider doing something like:

如果Jaime 的回答适用于您的情况,请忽略这一点。但如果不是,根据这个答案,要找到第 1 和第 3 四分位数的确切值,您应该考虑执行以下操作:

samples = sorted([28, 12, 8, 27, 16, 31, 14, 13, 19, 1, 1, 22, 13])

def find_median(sorted_list):
    indices = []

    list_size = len(sorted_list)
    median = 0

    if list_size % 2 == 0:
        indices.append(int(list_size / 2) - 1)  # -1 because index starts from 0
        indices.append(int(list_size / 2))

        median = (sorted_list[indices[0]] + sorted_list[indices[1]]) / 2
        pass
    else:
        indices.append(int(list_size / 2))

        median = sorted_list[indices[0]]
        pass

    return median, indices
    pass

median, median_indices = find_median(samples)
Q1, Q1_indices = find_median(samples[:median_indices[0]])
Q2, Q2_indices = find_median(samples[median_indices[-1] + 1:])

IQR = Q3 - Q1

quartiles = [Q1, median, Q2]

Code taken from the referenced answer.

代码取自参考答案。