计算 Pandas GroupBy 上的任意百分位数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19894939/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:12:15  来源:igfitidea点击:

Calculate Arbitrary Percentile on Pandas GroupBy

pandas

提问by Alex Rothberg

Currently there is a medianmethod on the Pandas's GroupByobjects.

目前有一个median关于 PandasGroupBy对象的方法。

Is there is a way to calculate an arbitrary percentile(see: http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html) on the groupings?

有没有办法计算分组上的任意值percentile(参见:http: //docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html)?

Median would be the calcuation of percentile with q=50.

中位数将是百分位数的计算q=50

回答by TomAugspurger

You want the quantilemethod:

你想要的quantile方法:

In [47]: df
Out[47]: 
           A         B    C
0   0.719391  0.091693  one
1   0.951499  0.837160  one
2   0.975212  0.224855  one
3   0.807620  0.031284  one
4   0.633190  0.342889  one
5   0.075102  0.899291  one
6   0.502843  0.773424  one
7   0.032285  0.242476  one
8   0.794938  0.607745  one
9   0.620387  0.574222  one
10  0.446639  0.549749  two
11  0.664324  0.134041  two
12  0.622217  0.505057  two
13  0.670338  0.990870  two
14  0.281431  0.016245  two
15  0.675756  0.185967  two
16  0.145147  0.045686  two
17  0.404413  0.191482  two
18  0.949130  0.943509  two
19  0.164642  0.157013  two

In [48]: df.groupby('C').quantile(.95)
Out[48]: 
            A         B
C                      
one  0.964541  0.871332
two  0.826112  0.969558

回答by Anshuman Goel

I found another useful solution here

我在这里找到了另一个有用的解决方案

If I have to use groupbyanother approach can be:

如果我必须使用groupby另一种方法可以是:

def percentile(n):
    def percentile_(x):
        return np.percentile(x, n)
    percentile_.__name__ = 'percentile_%s' % n
    return percentile_

Using the below call, I am able to achieve the same result as the solution given by @TomAugspurger

使用下面的调用,我能够获得与@TomAugspurger 给出的解决方案相同的结果

df.groupby('C').agg([percentile(50), percentile(95)])

df.groupby('C').agg([percentile(50), percentile(95)])