Python 熊猫数据框的中位数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29778636/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
median of pandas dataframe
提问by Ssank
I have a DataFrame df
:
我有一个数据帧df
:
name count
aaaa 2000
bbbb 1900
cccc 900
dddd 500
eeee 100
I would like to look at the rows that are to within a factor of 10 from the median of the count column.
我想查看与计数列的中位数相差 10 倍以内的行。
I tried df['count'].median()
and got the median. But don't know how to proceed further. Can you suggest how I could use pandas/numpy for this.
我尝试df['count'].median()
并得到了中位数。但不知道如何进一步。你能建议我如何为此使用 pandas/numpy 吗?
Expected Output :
预期输出:
name count distance from median
aaaa 2000 *****
I can use any measure as the distance from median (absolute deviation from median, quantiles etc.).
我可以使用任何度量作为与中位数的距离(与中位数、分位数等的绝对偏差)。
采纳答案by ComputerFellow
If you're looking for how to calculate the Median Absolute Deviation-
如果您正在寻找如何计算中值绝对偏差-
In [1]: df['dist'] = abs(df['count'] - df['count'].median())
In [2]: df
Out[2]:
name count dist
0 aaaa 2000 1100
1 bbbb 1900 1000
2 cccc 900 0
3 dddd 500 400
4 eeee 100 800
In [3]: df['dist'].median()
Out[3]: 800.0
回答by miradulo
for a column could also be calculated using statsmodels.robust.scale.mad
, which can also be passed a normalization constant c
which in this case is just 1.
对于一列,也可以使用 来计算statsmodels.robust.scale.mad
,它也可以传递一个归一化常数c
,在这种情况下仅为 1。
>>> from statsmodels.robust.scale import mad
>>> mad(df['count'], c=1)
800.0
回答by Marjan Alavi
If you want to see the median, you can use df.describe(). The 50% value is the median.
如果要查看中位数,可以使用 df.describe()。50% 的值是中位数。