如何使用 Python、Pandas 创建一个 Decile 和 Quintile 列以根据大小对另一个变量进行排名?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26496356/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to create a Decile and Quintile columns to rank another variable based on size using Python, Pandas?
提问by finstats
I have a data frame with a column containing Investmentwhich represents the amount invested by a trader. I would like to create 2 new columns in the data frame; one giving a decile rank and the other a quintile rank based on the Investmentsize. I want 1 to represent the decile with the largest Investments and 10 representing the smallest. Smilarly, I want 1 to represent the quintile with the largest investments and 5 representing the smallest.
我有一个数据框,其中一列包含Investment代表交易者投资的金额。我想在数据框中创建 2 个新列;一个根据Investment大小给出十分位数排名,另一个给出五分位数排名。我希望 1 代表投资最大的十分位数,10 代表最小的十分位数。类似地,我希望 1 代表投资最大的五分之一,而 5 代表投资最少的五分之一。
I am new to Pandas, so is there a way that I can easily do this? Thanks!
我是 Pandas 的新手,有什么方法可以轻松做到这一点吗?谢谢!
采纳答案by Dan Frank
The functionality you're looking for is in pandas.qcuthttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.qcut.html
您正在寻找的功能在pandas.qcuthttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.qcut.html
In [51]: import numpy as np
In [52]: import pandas as pd
In [53]: investment_df = pd.DataFrame(np.arange(10), columns=['investment'])
In [54]: investment_df['decile'] = pd.qcut(investment_df['investment'], 10, labels=False)
In [55]: investment_df['quintile'] = pd.qcut(investment_df['investment'], 5, labels=False)
In [56]: investment_df
Out[56]:
investment decile quintile
0 0 0 0
1 1 1 0
2 2 2 1
3 3 3 1
4 4 4 2
5 5 5 2
6 6 6 3
7 7 7 3
8 8 8 4
9 9 9 4
It's nonstandard to label the largest percentile with the smallest number but you can do this by
用最小的数字标记最大的百分位数是不标准的,但您可以通过
In [60]: investment_df['quintile'] = pd.qcut(investment_df['investment'], 5, labels=np.arange(5, 0, -1))
In [61]: investment_df['decile'] = pd.qcut(investment_df['investment'], 10, labels=np.arange(10, 0, -1))
In [62]: investment_df
Out[62]:
investment decile quintile
0 0 10 5
1 1 9 5
2 2 8 4
3 3 7 4
4 4 6 3
5 5 5 3
6 6 4 2
7 7 3 2
8 8 2 1
9 9 1 1

