如何使用 Python、Pandas 创建一个 Decile 和 Quintile 列以根据大小对另一个变量进行排名?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26496356/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:33:39  来源:igfitidea点击:

How to create a Decile and Quintile columns to rank another variable based on size using Python, Pandas?

pythonpandasranking

提问by finstats

I have a data frame with a column containing Investmentwhich represents the amount invested by a trader. I would like to create 2 new columns in the data frame; one giving a decile rank and the other a quintile rank based on the Investmentsize. I want 1 to represent the decile with the largest Investments and 10 representing the smallest. Smilarly, I want 1 to represent the quintile with the largest investments and 5 representing the smallest.

我有一个数据框,其中一列包含Investment代表交易者投资的金额。我想在数据框中创建 2 个新列;一个根据Investment大小给出十分位数排名,另一个给出五分位数排名。我希望 1 代表投资最大的十分位数,10 代表最小的十分位数。类似地,我希望 1 代表投资最大的五分之一,而 5 代表投资最少的五分之一。

I am new to Pandas, so is there a way that I can easily do this? Thanks!

我是 Pandas 的新手,有什么方法可以轻松做到这一点吗?谢谢!

采纳答案by Dan Frank

The functionality you're looking for is in pandas.qcuthttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.qcut.html

您正在寻找的功能在pandas.qcuthttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.qcut.html

In [51]: import numpy as np

In [52]: import pandas as pd

In [53]: investment_df = pd.DataFrame(np.arange(10), columns=['investment'])

In [54]: investment_df['decile'] = pd.qcut(investment_df['investment'], 10, labels=False)

In [55]: investment_df['quintile'] = pd.qcut(investment_df['investment'], 5, labels=False)

In [56]: investment_df
Out[56]: 
   investment  decile  quintile
0           0       0         0
1           1       1         0
2           2       2         1
3           3       3         1
4           4       4         2
5           5       5         2
6           6       6         3
7           7       7         3
8           8       8         4
9           9       9         4   

It's nonstandard to label the largest percentile with the smallest number but you can do this by

用最小的数字标记最大的百分位数是不标准的,但您可以通过

In [60]: investment_df['quintile'] = pd.qcut(investment_df['investment'], 5, labels=np.arange(5, 0, -1))

In [61]: investment_df['decile'] = pd.qcut(investment_df['investment'], 10, labels=np.arange(10, 0, -1))

In [62]: investment_df
Out[62]: 
   investment decile quintile
0           0     10        5
1           1      9        5
2           2      8        4
3           3      7        4
4           4      6        3
5           5      5        3
6           6      4        2
7           7      3        2
8           8      2        1
9           9      1        1