Python 消除给定百分位数上的所有数据

Question

提问by Roy Smith

I have a pandas DataFramecalled datawith a column called ms. I want to eliminate all the rows where data.msis above the 95% percentile. For now, I'm doing this:

我有一个DataFrame名为.pandasdata的列ms。我想消除data.ms95% 以上的所有行。现在，我正在这样做：

limit = data.ms.describe(90)['95%']
valid_data = data[data['ms'] < limit]

which works, but I want to generalize that to any percentile. What's the best way to do that?

哪个有效，但我想将其推广到任何百分位数。这样做的最佳方法是什么？

Answer 1

采纳答案by Phillip Cloud

Use the Series.quantile()method:

使用Series.quantile()方法：

In [48]: cols = list('abc')

In [49]: df = DataFrame(randn(10, len(cols)), columns=cols)

In [50]: df.a.quantile(0.95)
Out[50]: 1.5776961953820687

To filter out rows of dfwhere df.ais greater than or equal to the 95th percentile do:

过滤掉的行df，其中df.a大于或等于第95百分位做：

In [72]: df[df.a < df.a.quantile(.95)]
Out[72]:
       a      b      c
0 -1.044 -0.247 -1.149
2  0.395  0.591  0.764
3 -0.564 -2.059  0.232
4 -0.707 -0.736 -1.345
5  0.978 -0.099  0.521
6 -0.974  0.272 -0.649
7  1.228  0.619 -0.849
8 -0.170  0.458 -0.515
9  1.465  1.019  0.966

Answer 2

回答by 2diabolos.com

numpy is much faster than Pandas for this kind of things :

在这种情况下，numpy 比 Pandas 快得多：

numpy.percentile(df.a,95) # attention : the percentile is given in percent (5 = 5%)

is equivalent but 3 times faster than :

等效但比快 3 倍：

df.a.quantile(.95)  # as you already noticed here it is ".95" not "95"

so for your code, it gives :

所以对于你的代码，它给出：

df[df.a < np.percentile(df.a,95)]

Python 消除给定百分位数上的所有数据

提问by Roy Smith

采纳答案by Phillip Cloud

回答by 2diabolos.com

相关推荐

最近更新

标签

Python 消除给定百分位数上的所有数据

提问by Roy Smith

采纳答案by Phillip Cloud

回答by 2diabolos.com

相关推荐

如何为nodejs运行像pm2这样的python脚本

Python 安装 xmlrpclib

Python 多行返回语句

PYTHON 错误：IndentationError：unindent 不匹配任何外部缩进级别

相关推荐

最近更新

标签