pandas 在熊猫中创建百分位桶
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17286672/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating percentile buckets in pandas
提问by nitin
I am trying to classify my data in percentile buckets based on their values. My data looks like,
我正在尝试根据它们的值将我的数据分类到百分位桶中。我的数据看起来像,
a = pnd.DataFrame(index = ['a','b','c','d','e','f','g','h','i','j'], columns=['data'])
a.data = np.random.randn(10)
print a
print '\nthese are ranked as shown'
print a.rank()
data
a -0.310188
b -0.191582
c 0.860467
d -0.458017
e 0.858653
f -1.640166
g -1.969908
h 0.649781
i 0.218000
j 1.887577
these are ranked as shown
data
a 4
b 5
c 9
d 3
e 8
f 2
g 1
h 7
i 6
j 10
To rank this data, I am using the rank function. However, I am interested in the creating a bucket of the top 20%. In the example shown above, this would be a list containing labels ['c', 'j']
为了对这些数据进行排名,我使用了 rank 函数。但是,我对创建前 20% 的桶很感兴趣。在上面显示的示例中,这将是一个包含标签 ['c', 'j'] 的列表
desired result : ['c','j']
How do I get the desired result
我如何得到想要的结果
回答by Dan Allan
In [13]: df[df > df.quantile(0.8)].dropna()
Out[13]:
data
c 0.860467
j 1.887577
In [14]: list(df[df > df.quantile(0.8)].dropna().index)
Out[14]: ['c', 'j']

