pandas 在熊猫中创建百分位桶

Question

提问by nitin

I am trying to classify my data in percentile buckets based on their values. My data looks like,

我正在尝试根据它们的值将我的数据分类到百分位桶中。我的数据看起来像，

a = pnd.DataFrame(index = ['a','b','c','d','e','f','g','h','i','j'], columns=['data'])
a.data = np.random.randn(10)
print a
print '\nthese are ranked as shown'
print a.rank()

       data
a -0.310188
b -0.191582
c  0.860467
d -0.458017
e  0.858653
f -1.640166
g -1.969908
h  0.649781
i  0.218000
j  1.887577

these are ranked as shown
   data
a     4
b     5
c     9
d     3
e     8
f     2
g     1
h     7
i     6
j    10

To rank this data, I am using the rank function. However, I am interested in the creating a bucket of the top 20%. In the example shown above, this would be a list containing labels ['c', 'j']

为了对这些数据进行排名，我使用了 rank 函数。但是，我对创建前 20% 的桶很感兴趣。在上面显示的示例中，这将是一个包含标签 ['c', 'j'] 的列表

desired result : ['c','j']

How do I get the desired result

我如何得到想要的结果

Answer 1

回答by Dan Allan

In [13]: df[df > df.quantile(0.8)].dropna()
Out[13]: 
       data
c  0.860467
j  1.887577

In [14]: list(df[df > df.quantile(0.8)].dropna().index)
Out[14]: ['c', 'j']

pandas 在熊猫中创建百分位桶

提问by nitin

回答by Dan Allan

相关推荐

最近更新

标签

pandas 在熊猫中创建百分位桶

提问by nitin

回答by Dan Allan

相关推荐

Python Pandas——合并大部分重复的行

使用另一个系列过滤 Pandas 数据框

pandas 在 Python 中绘制直方图的时间序列

使用 Pandas 从订单的时间序列创建订单簿的快照？

相关推荐

最近更新

标签