pandas 计算特定组的百分位数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36944884/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Calculating percentile for specific groups
提问by Anu
I have 3 columns. Product Id
, Price
, Group
(values A, B, C, D)
我有 3 列。Product Id
, Price
, Group
(值 A、B、C、D)
I want to get price percentile for each group and I am running the following code.
我想获得每个组的价格百分位数,我正在运行以下代码。
for group, price in df.groupby(['group']):
df['percentile'] = np.percentile(df['price'],60)
the column percentile has only one value 3.44 for each group.
The expected values for each group were 2.12, 3.43, 3.65, 4.76. 8.99
.
每一组的列百分位数只有一个值 3.44。每组的预期值为2.12, 3.43, 3.65, 4.76. 8.99
。
What is going wrong here, please let me know.
这里出了什么问题,请告诉我。
回答by jezrael
I think you can use in loop not all DataFrame
df
with column price
, but group price
with column price
:
我认为你不能在循环中DataFrame
df
使用 column price
,而是price
使用 column price
:
import pandas as pd
import numpy as np
np.random.seed(1)
df = pd.DataFrame(np.random.randint(10, size=(5,3)))
df.columns = ['Product Id','group','price']
print df
Product Id group price
0 5 8 9
1 5 0 0
2 1 7 6
3 9 2 4
4 5 2 4
for group, price in df.groupby(['group']):
print np.percentile(df['price'],60)
4.8
4.8
4.8
4.8
group
for group, price in df.groupby(['group']):
print np.percentile(price['price'],60)
0.0
4.0
6.0
9.0
Another solution for np.percentile
where is output Serie
:
np.percentile
输出的另一种解决方案是Serie
:
print df.groupby(['group'])['price'].apply(lambda x: np.percentile(x,60))
group
0 0.0
2 4.0
7 6.0
8 9.0
Name: price, dtype: float64
Solution with DataFrameGroupBy.quantile
:
解决方案DataFrameGroupBy.quantile
:
print df.groupby(['group'])['price'].quantile(.6)
group
0 0.0
2 4.0
7 6.0
8 9.0
Name: price, dtype: float64
EDIT by comment:
通过评论编辑:
If you need new column use transform
, docs:
>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.randint(10,size=(20,3)))
>>> df.columns = ['Product Id','group','price']
>>> df
Product Id group price
0 5 8 9
1 5 0 0
2 1 7 6
3 9 2 4
4 5 2 4
5 2 4 7
6 7 9 1
7 7 0 6
8 9 9 7
9 6 9 1
10 0 1 8
11 8 3 9
12 8 7 3
13 6 5 1
14 9 3 4
15 8 1 4
16 0 3 9
17 2 0 4
18 9 2 7
19 7 9 8
>>> df['percentil'] = df.groupby(['group'])['price'].transform(lambda x: x.quantile(.6))
>>> df
Product Id group price percentil
0 5 8 9 9.0
1 5 0 0 4.4
2 1 7 6 4.8
3 9 2 4 4.6
4 5 2 4 4.6
5 2 4 7 7.0
6 7 9 1 5.8
7 7 0 6 4.4
8 9 9 7 5.8
9 6 9 1 5.8
10 0 1 8 6.4
11 8 3 9 9.0
12 8 7 3 4.8
13 6 5 1 1.0
14 9 3 4 9.0
15 8 1 4 6.4
16 0 3 9 9.0
17 2 0 4 4.4
18 9 2 7 4.6
19 7 9 8 5.8