Python 熊猫数据框的中位数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29778636/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 05:00:04  来源:igfitidea点击:

median of pandas dataframe

pythonrnumpypandas

提问by Ssank

I have a DataFrame df:

我有一个数据帧df

name   count    
aaaa   2000    
bbbb   1900    
cccc    900    
dddd    500    
eeee    100

I would like to look at the rows that are to within a factor of 10 from the median of the count column.

我想查看与计数列的中位数相差 10 倍以内的行。

I tried df['count'].median()and got the median. But don't know how to proceed further. Can you suggest how I could use pandas/numpy for this.

我尝试df['count'].median()并得到了中位数。但不知道如何进一步。你能建议我如何为此使用 pandas/numpy 吗?

Expected Output :

预期输出:

name count distance from median

aaaa  2000   *****

I can use any measure as the distance from median (absolute deviation from median, quantiles etc.).

我可以使用任何度量作为与中位数的距离(与中位数、分位数等的绝对偏差)。

采纳答案by ComputerFellow

If you're looking for how to calculate the Median Absolute Deviation-

如果您正在寻找如何计算中值绝对偏差-

In [1]: df['dist'] = abs(df['count'] - df['count'].median())

In [2]: df
Out[2]:
   name  count  dist
0  aaaa   2000  1100
1  bbbb   1900  1000
2  cccc    900     0
3  dddd    500   400
4  eeee    100   800

In [3]: df['dist'].median()
Out[3]: 800.0

回答by miradulo

Median absolute deviation,

中值绝对偏差,

                                               enter image description here

                                               在此处输入图片说明

for a column could also be calculated using statsmodels.robust.scale.mad, which can also be passed a normalization constant cwhich in this case is just 1.

对于一列,也可以使用 来计算statsmodels.robust.scale.mad,它也可以传递一个归一化常数c,在这种情况下仅为 1。

>>> from statsmodels.robust.scale import mad
>>> mad(df['count'], c=1)
800.0

回答by Marjan Alavi

If you want to see the median, you can use df.describe(). The 50% value is the median.

如果要查看中位数,可以使用 df.describe()。50% 的值是中位数。