Pandas groupby 分位数值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47637774/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:51:39  来源:igfitidea点击:

Pandas groupby quantile values

pythonpandas

提问by lignin

I tried to calculate specific quantile values from a data frame, as shown in the code below. There was no problem when calculate it in separate lines.

我尝试从数据框中计算特定的分位数值,如下面的代码所示。在单独的行中计算时没有问题。

When attempting to run last 2 lines, I get the following error:

尝试运行最后 2 行时,出现以下错误:

AttributeError: 'SeriesGroupBy' object has no attribute 'quantile(0.25)'

How can I fix this?

我怎样才能解决这个问题?

import pandas as pd
df = pd.DataFrame(
    {
        'x': [0, 1, 0, 1, 0, 1, 0, 1],
        'y': [7, 6, 5, 4, 3, 2, 1, 0],
        'number': [25000, 35000, 45000, 50000, 60000, 70000, 65000, 36000]
    }
)
f = {'number': ['median', 'std', 'quantile']}
df1 = df.groupby('x').agg(f)
df.groupby('x').quantile(0.25)
df.groupby('x').quantile(0.75)

# code below with problem:
f = {'number': ['median', 'std', 'quantile(0.25)', 'quantile(0.75)']}
df1 = df.groupby('x').agg(f)

回答by YOBEN_S

I prefer def functions

我更喜欢 def 函数

def q1(x):
    return x.quantile(0.25)

def q2(x):
    return x.quantile(0.75)

f = {'number': ['median', 'std', q1,q2]}
df1 = df.groupby('x').agg(f)
df1
Out[1643]: 
  number                            
  median           std     q1     q2
x                                   
0  52500  17969.882211  40000  61250
1  43000  16337.584481  35750  55000

回答by Jurgen Strydom

@WeNYoBen's answer is great. There is one limitation though, and that lies with the fact that one needs to create a new function for every quantile. This can be a very unpythonic exercise if the number of quantiles become large. A better approach is to use a function to create a function, and to rename that function appropriately.

@WeNYoBen的回答很棒。但是有一个限制,那就是需要为每个分位数创建一个新函数。如果分位数的数量变大,这可能是一个非常非 Python 的练习。更好的方法是使用函数来创建函数,并适当地重命名该函数。

def rename(newname):
    def decorator(f):
        f.__name__ = newname
        return f
    return decorator

def q_at(y):
    @rename(f'q{y:0.2f}')
    def q(x):
        return x.quantile(y)
    return q

f = {'number': ['median', 'std', q_at(0.25) ,q_at(0.75)]}
df1 = df.groupby('x').agg(f)
df1

Out[]:
number                            
  median           std  q0.25  q0.75
x                                   
0  52500  17969.882211  40000  61250
1  43000  16337.584481  35750  55000

The rename decorator renames the function so that the pandas agg function can deal with the reuse of the quantile function returned (otherwise all quantiles results end up in columns that are named q).

重命名装饰器重命名函数,以便pandas agg 函数可以处理返回的分位数函数的重用(否则所有分位数结果都以名为q 的列结束)。