Python Pandas 中每组的排名顺序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33899369/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:09:50  来源:igfitidea点击:

Ranking order per group in Pandas

pythonpandas

提问by Amelio Vazquez-Reina

Consider a dataframe with three columns: group_ID, item_IDand value. Say we have 10 itemIDstotal.

考虑一个包含三列的数据框:group_ID,item_IDvalue。假设我们itemIDs总共有 10个。

I need to rank each item_ID(1 to 10) withineach group_IDbased on value, and then see the mean rank (and other stats) across groups (e.g. the IDs with the highestvalue across groups would get ranks closer to 1). How can I do this in Pandas?

我需要每一个排名item_ID(1〜10)group_ID基础上value,再看到平均等级(和其它数据)跨群体(如用的ID最高的各组值会得到等级越接近1)。我怎样才能在 Pandas 中做到这一点?

This answerdoes something very close with qcut, but not exactly the same.

这个答案与 非常接近qcut,但并不完全相同。



A data example would look like:

数据示例如下所示:

      group_ID   item_ID  value
0   0S00A1HZEy        AB     10
1   0S00A1HZEy        AY      4
2   0S00A1HZEy        AC     35
3   0S03jpFRaC        AY     90
4   0S03jpFRaC        A5      3
5   0S03jpFRaC        A3     10
6   0S03jpFRaC        A2      8
7   0S03jpFRaC        A4      9
8   0S03jpFRaC        A6      2
9   0S03jpFRaC        AX      0

which would result in:

这将导致:

      group_ID   item_ID   rank
0   0S00A1HZEy        AB      2
1   0S00A1HZEy        AY      3
2   0S00A1HZEy        AC      1
3   0S03jpFRaC        AY      1
4   0S03jpFRaC        A5      5
5   0S03jpFRaC        A3      2
6   0S03jpFRaC        A2      4
7   0S03jpFRaC        A4      3
8   0S03jpFRaC        A6      6
9   0S03jpFRaC        AX      7

采纳答案by DSM

There are lots of different arguments you can pass to rank; it looks like you can use rank("dense", ascending=False)to get the results you want, after doing a groupby:

您可以传递许多不同的参数rankrank("dense", ascending=False)在执行以下操作后,您似乎可以使用它来获得所需的结果groupby

>>> df["rank"] = df.groupby("group_ID")["value"].rank("dense", ascending=False)
>>> df
     group_ID item_ID  value  rank
0  0S00A1HZEy      AB     10     2
1  0S00A1HZEy      AY      4     3
2  0S00A1HZEy      AC     35     1
3  0S03jpFRaS      AY     90     1
4  0S03jpFRaS      A5      3     5
5  0S03jpFRaS      A3     10     2
6  0S03jpFRaS      A2      8     4
7  0S03jpFRaS      A4      9     3
8  0S03jpFRaS      A6      2     6
9  0S03jpFRaS      AX      0     7

But note that if you're not using a global ranking scheme, finding out the mean rank across groups isn't very meaningful-- unless there are duplicate values in a group (and so you have duplicate rank values) all you're doing is measuring how many elements there are in a group.

但请注意,如果您不使用全局排名方案,那么找出各组的平均排名并不是很有意义——除非您正在做的所有组中都有重复的值(因此您有重复的排名值)正在测量一个组中有多少个元素。