Python pandas:查找给定列的百分位统计信息
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39581893/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas: find percentile stats of a given column
提问by Edamame
I have a pandas data frame my_df, where I can find the mean(), median(), mode() of a given column:
我有一个 Pandas 数据框 my_df,我可以在其中找到给定列的 mean()、median()、mode():
my_df['field_A'].mean()
my_df['field_A'].median()
my_df['field_A'].mode()
I am wondering is it possible to find more detailed stats such as 90 percentile? Thanks!
我想知道是否有可能找到更详细的统计数据,例如 90%?谢谢!
回答by stackoverflowuser2010
You can use the pandas.DataFrame.quantile()function, as shown below.
您可以使用pandas.DataFrame.quantile()函数,如下所示。
import pandas as pd
import random
A = [ random.randint(0,100) for i in range(10) ]
B = [ random.randint(0,100) for i in range(10) ]
df = pd.DataFrame({ 'field_A': A, 'field_B': B })
df
# field_A field_B
# 0 90 72
# 1 63 84
# 2 11 74
# 3 61 66
# 4 78 80
# 5 67 75
# 6 89 47
# 7 12 22
# 8 43 5
# 9 30 64
df.field_A.mean() # Same as df['field_A'].mean()
# 54.399999999999999
df.field_A.median()
# 62.0
# You can call `quantile(i)` to get the i'th quantile,
# where `i` should be a fractional number.
df.field_A.quantile(0.1) # 10th percentile
# 11.9
df.field_A.quantile(0.5) # same as median
# 62.0
df.field_A.quantile(0.9) # 90th percentile
# 89.10000000000001
回答by piRSquared
assume series s
假设系列 s
s = pd.Series(np.arange(100))
Get quantiles for [.1, .2, .3, .4, .5, .6, .7, .8, .9]
获取分位数 [.1, .2, .3, .4, .5, .6, .7, .8, .9]
s.quantile(np.linspace(.1, 1, 9, 0))
0.1 9.9
0.2 19.8
0.3 29.7
0.4 39.6
0.5 49.5
0.6 59.4
0.7 69.3
0.8 79.2
0.9 89.1
dtype: float64
OR
或者
s.quantile(np.linspace(.1, 1, 9, 0), 'lower')
0.1 9
0.2 19
0.3 29
0.4 39
0.5 49
0.6 59
0.7 69
0.8 79
0.9 89
dtype: int32
回答by Edamame
I figured out below would work:
我想出了下面的工作:
my_df.dropna().quantile([0.0, .9])