pandas 计算特定组的百分位数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36944884/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:08:53  来源:igfitidea点击:

Calculating percentile for specific groups

pythonloopsnumpypandasgroup-by

提问by Anu

I have 3 columns. Product Id, Price, Group(values A, B, C, D)

我有 3 列。Product Id, Price, Group(值 A、B、C、D)

I want to get price percentile for each group and I am running the following code.

我想获得每个组的价格百分位数,我正在运行以下代码。

for group, price in df.groupby(['group']):
    df['percentile'] = np.percentile(df['price'],60)

the column percentile has only one value 3.44 for each group. The expected values for each group were 2.12, 3.43, 3.65, 4.76. 8.99.

每一组的列百分位数只有一个值 3.44。每组的预期值为2.12, 3.43, 3.65, 4.76. 8.99

What is going wrong here, please let me know.

这里出了什么问题,请告诉我。

回答by jezrael

I think you can use in loop not all DataFramedfwith column price, but group pricewith column price:

我认为你不能在循环中DataFramedf使用 column price,而是price使用 column price

import pandas as pd
import numpy as np

np.random.seed(1)
df = pd.DataFrame(np.random.randint(10, size=(5,3)))
df.columns = ['Product Id','group','price']
print df
   Product Id  group  price
0           5      8      9
1           5      0      0
2           1      7      6
3           9      2      4
4           5      2      4

for group, price in df.groupby(['group']):
    print np.percentile(df['price'],60)
4.8
4.8
4.8
4.8
group   

for group, price in df.groupby(['group']):
    print np.percentile(price['price'],60)
0.0
4.0
6.0
9.0    

Another solution for np.percentilewhere is output Serie:

np.percentile输出的另一种解决方案是Serie

print df.groupby(['group'])['price'].apply(lambda x: np.percentile(x,60))
group
0    0.0
2    4.0
7    6.0
8    9.0
Name: price, dtype: float64

Solution with DataFrameGroupBy.quantile:

解决方案DataFrameGroupBy.quantile

print df.groupby(['group'])['price'].quantile(.6)
group
0    0.0
2    4.0
7    6.0
8    9.0
Name: price, dtype: float64

EDIT by comment:

通过评论编辑:

If you need new column use transform, docs:

如果您需要使用新列transform文档

>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.randint(10,size=(20,3)))
>>> df.columns = ['Product Id','group','price']
>>> df
    Product Id  group  price
0            5      8      9
1            5      0      0
2            1      7      6
3            9      2      4
4            5      2      4
5            2      4      7
6            7      9      1
7            7      0      6
8            9      9      7
9            6      9      1
10           0      1      8
11           8      3      9
12           8      7      3
13           6      5      1
14           9      3      4
15           8      1      4
16           0      3      9
17           2      0      4
18           9      2      7
19           7      9      8
>>> df['percentil'] = df.groupby(['group'])['price'].transform(lambda x: x.quantile(.6))
>>> df
    Product Id  group  price  percentil
0            5      8      9        9.0
1            5      0      0        4.4
2            1      7      6        4.8
3            9      2      4        4.6
4            5      2      4        4.6
5            2      4      7        7.0
6            7      9      1        5.8
7            7      0      6        4.4
8            9      9      7        5.8
9            6      9      1        5.8
10           0      1      8        6.4
11           8      3      9        9.0
12           8      7      3        4.8
13           6      5      1        1.0
14           9      3      4        9.0
15           8      1      4        6.4
16           0      3      9        9.0
17           2      0      4        4.4
18           9      2      7        4.6
19           7      9      8        5.8

回答by Sam

You could try pandas quantile

你可以试试Pandas分位数

df[['group', 'price']].groupby('group').quantile(.6)

Return values at the given quantile over requested axis, a la numpy.percentile.

返回请求轴上给定分位数的值,一个 numpy.percentile。