Python 熊猫数据框的中位数

Question

提问by Ssank

I have a DataFrame df:

我有一个数据帧df：

name   count    
aaaa   2000    
bbbb   1900    
cccc    900    
dddd    500    
eeee    100

I would like to look at the rows that are to within a factor of 10 from the median of the count column.

我想查看与计数列的中位数相差 10 倍以内的行。

I tried df['count'].median()and got the median. But don't know how to proceed further. Can you suggest how I could use pandas/numpy for this.

我尝试df['count'].median()并得到了中位数。但不知道如何进一步。你能建议我如何为此使用 pandas/numpy 吗？

Expected Output :

预期输出：

name count distance from median

aaaa  2000   *****

I can use any measure as the distance from median (absolute deviation from median, quantiles etc.).

我可以使用任何度量作为与中位数的距离（与中位数、分位数等的绝对偏差）。

Answer 1

采纳答案by ComputerFellow

If you're looking for how to calculate the Median Absolute Deviation-

如果您正在寻找如何计算中值绝对偏差-

In [1]: df['dist'] = abs(df['count'] - df['count'].median())

In [2]: df
Out[2]:
   name  count  dist
0  aaaa   2000  1100
1  bbbb   1900  1000
2  cccc    900     0
3  dddd    500   400
4  eeee    100   800

In [3]: df['dist'].median()
Out[3]: 800.0

Answer 2

回答by miradulo

Median absolute deviation,

中值绝对偏差，

for a column could also be calculated using statsmodels.robust.scale.mad, which can also be passed a normalization constant cwhich in this case is just 1.

对于一列，也可以使用来计算statsmodels.robust.scale.mad，它也可以传递一个归一化常数c，在这种情况下仅为 1。

>>> from statsmodels.robust.scale import mad
>>> mad(df['count'], c=1)
800.0

Answer 3

回答by Marjan Alavi

If you want to see the median, you can use df.describe(). The 50% value is the median.

如果要查看中位数，可以使用 df.describe()。50% 的值是中位数。

Python 熊猫数据框的中位数

提问by Ssank

采纳答案by ComputerFellow

回答by miradulo

回答by Marjan Alavi

相关推荐

最近更新

标签

Python 熊猫数据框的中位数

提问by Ssank

采纳答案by ComputerFellow

回答by miradulo

回答by Marjan Alavi

相关推荐

如何将我的 Python 脚本与我的 HTML 文件连接起来？

Python 如何获取所选单选按钮的值？

Python Django NameError：未定义名称“视图”

如何查找 Python 包的依赖项

相关推荐

最近更新

标签