pandas 熊猫密集排名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39357882/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:58:02  来源:igfitidea点击:

Pandas DENSE RANK

pythonsqlpandas

提问by Keithx

I'm dealing with pandas dataframe and have a frame like this:

我正在处理Pandas数据框并有一个这样的框架:

Year Value  
2012  10
2013  20
2013  25
2014  30

I want to make an equialent to DENSE_RANK () over (order by year) function. to make an additional column like this:

我想使 DENSE_RANK() over (order by year) 函数等效。制作一个这样的附加列:

    Year Value Rank
    2012  10    1
    2013  20    2
    2013  25    2
    2014  30    3

How can it be done in pandas?

如何在Pandas中做到这一点?

Thanks!

谢谢!

回答by piRSquared

Use pd.Series.rankwith method='dense'

使用pd.Series.rankmethod='dense'

df['Rank'] = df.Year.rank(method='dense').astype(int)

df

enter image description here

在此处输入图片说明

回答by jezrael

The fastest solution is factorize:

最快的解决方案是factorize

df['Rank'] = pd.factorize(df.Year)[0] + 1

Timings:

时间

#len(df)=40k
df = pd.concat([df]*10000).reset_index(drop=True)

In [13]: %timeit df['Rank'] = df.Year.rank(method='dense').astype(int)
1000 loops, best of 3: 1.55 ms per loop

In [14]: %timeit df['Rank1'] = df.Year.astype('category').cat.codes + 1
1000 loops, best of 3: 1.22 ms per loop

In [15]: %timeit df['Rank2'] = pd.factorize(df.Year)[0] + 1
1000 loops, best of 3: 737 μs per loop

回答by Alexander

You can convert the year to categoricals and then take their codes (adding one because they are zero indexed and you wanted the initial value to start with one per your example).

您可以将年份转换为分类,然后获取它们的代码(添加一个,因为它们的索引为零,并且您希望每个示例的初始值都从一个开始)。

df['Rank'] = df.Year.astype('category').cat.codes + 1

>>> df
   Year  Value  Rank
0  2012     10     1
1  2013     20     2
2  2013     25     2
3  2014     30     3

回答by ALollz

Groupby.ngroup

Groupby.ngroup

Will sort keys by default so smaller years get labeled lower. Can set sort=Falseto rank groups based on order of occurrence.

默认情况下会对键进行排序,以便较小的年份标记为较低。可以设置sort=False根据出现顺序对组进行排名。

df['Rank'] = df.groupby('Year', sort=True).ngroup()+1


np.unique

np.unique

Also sorts, so use return_inverseto rank the smaller values lowest.

也排序,因此用于return_inverse将较小的值排名最低。

df['Rank'] = np.unique(df['Year'], return_inverse=True)[1]+1