pandas 熊猫密集排名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39357882/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DENSE RANK
提问by Keithx
I'm dealing with pandas dataframe and have a frame like this:
我正在处理Pandas数据框并有一个这样的框架:
Year Value
2012 10
2013 20
2013 25
2014 30
I want to make an equialent to DENSE_RANK () over (order by year) function. to make an additional column like this:
我想使 DENSE_RANK() over (order by year) 函数等效。制作一个这样的附加列:
Year Value Rank
2012 10 1
2013 20 2
2013 25 2
2014 30 3
How can it be done in pandas?
如何在Pandas中做到这一点?
Thanks!
谢谢!
回答by piRSquared
回答by jezrael
The fastest solution is factorize
:
最快的解决方案是factorize
:
df['Rank'] = pd.factorize(df.Year)[0] + 1
Timings:
时间:
#len(df)=40k
df = pd.concat([df]*10000).reset_index(drop=True)
In [13]: %timeit df['Rank'] = df.Year.rank(method='dense').astype(int)
1000 loops, best of 3: 1.55 ms per loop
In [14]: %timeit df['Rank1'] = df.Year.astype('category').cat.codes + 1
1000 loops, best of 3: 1.22 ms per loop
In [15]: %timeit df['Rank2'] = pd.factorize(df.Year)[0] + 1
1000 loops, best of 3: 737 μs per loop
回答by Alexander
You can convert the year to categoricals and then take their codes (adding one because they are zero indexed and you wanted the initial value to start with one per your example).
您可以将年份转换为分类,然后获取它们的代码(添加一个,因为它们的索引为零,并且您希望每个示例的初始值都从一个开始)。
df['Rank'] = df.Year.astype('category').cat.codes + 1
>>> df
Year Value Rank
0 2012 10 1
1 2013 20 2
2 2013 25 2
3 2014 30 3
回答by ALollz
Groupby.ngroup
Groupby.ngroup
Will sort keys by default so smaller years get labeled lower. Can set sort=False
to rank groups based on order of occurrence.
默认情况下会对键进行排序,以便较小的年份标记为较低。可以设置sort=False
根据出现顺序对组进行排名。
df['Rank'] = df.groupby('Year', sort=True).ngroup()+1
np.unique
np.unique
Also sorts, so use return_inverse
to rank the smaller values lowest.
也排序,因此用于return_inverse
将较小的值排名最低。
df['Rank'] = np.unique(df['Year'], return_inverse=True)[1]+1