按多列和重复索引对 Pandas DataFrame 进行排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33193468/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:03:45  来源:igfitidea点击:

Sort pandas DataFrame by multiple columns and duplicated index

pythonpandas

提问by user2034412

I have a pandas DataFrame with duplicated indices. There are 3 rows with each index, and they correspond to a group of items. There are two columns, aand b.

我有一个带有重复索引的 Pandas DataFrame。每个索引有 3 行,它们对应一组项目。有两列,ab

df = pandas.DataFrame([{'i': b % 4, 'a': abs(b - 6) , 'b': b}
                       for b in range(12)]).set_index('i')

I want to sort the DataFrame so that:

我想对 DataFrame 进行排序,以便:

  1. All of the rows with the same indices are adjacent. (all of the groups are together)
  2. The groups are in reverse order by the lowest value of awithin the group.
  1. 具有相同索引的所有行都是相邻的。(所有组都在一起)
  2. 组按a组内最低值的相反顺序排列。

For example, in the above df, the first three items should be the ones with index 0, because the lowest avalue for those three rows is 2, and all of the other groups have at least one row with an avalue lower than 2. The second three items could be either group 3 or group 1, because the lowest avalue in both of those groups is 1. The last group of items should be group 2, because it has a row with an avalue of 0.

例如,在上面的 中df,前三项应该是有 index 的项0,因为a这三行的最低值是 2,而其他所有组至少有一行的a值小于 2。后三项项目可以是第 3 组或第 1 组,因为这a两个组中的最小值都是 1。最后一组项目应该是第 2 组,因为它有一行a值为 0。

  1. Within each group, the items are sorted in ascending order by b.
  1. 在每个组中,项目按升序排序b

Desired output:

期望的输出:

    a  b
i
0  6  0
0  2  4 
0  2  8
3  3  3
3  1  7
3  5  11
1  5  1
1  1  5
1  3  9
2  4  2
2  0  6
2  4  10

I've been trying something like:

我一直在尝试类似的东西:

df.groupby('i')[['a']].transform(min).sort(['a', 'b'], ascending=[0, 1])

But it gives me a KeyError, and it only gets that far if I make ia column instead of an index anyway.

但它给了我一个 KeyError,而且只有当我创建i一个列而不是一个索引时它才会那么远。

回答by Alexander

You can first sort by ain descending order and then sort your index:

您可以先按a降序排序,然后对索引进行排序:

>>> df.sort(['a', 'b'], ascending=[False, True]).sort_index()
   a   b
i       
0  6   0
0  2   4
0  2   8
1  5   1
1  3   9
1  1   5
2  4   2
2  4  10
2  0   6
3  5  11
3  3   3
3  1   7

回答by chrisb

The most straightforward way I see is moving your index to a column, and calculating a new column with the group min.

我看到的最直接的方法是将您的索引移动到一列,并使用组 min 计算一个新列。

In [43]: df = df.reset_index()

In [45]: df['group_min'] = df.groupby('i')['a'].transform('min')

Then you can sort by your conditions:

然后你可以按你的条件排序:

In [49]: df.sort_values(['group_min', 'i', 'b'], ascending=[False, False, True])
Out[49]: 
    i  a   b  group_min
0   0  6   0          2
4   0  2   4          2
8   0  2   8          2
3   3  3   3          1
7   3  1   7          1
11  3  5  11          1
1   1  5   1          1
5   1  1   5          1
9   1  3   9          1
2   2  4   2          0
6   2  0   6          0
10  2  4  10          0

To get back to your desired frame, drop the tracking variable and reset the index.

要返回所需的帧,请删除跟踪变量并重置索引。

In [50]: df.sort_values(['group_min', 'i', 'b'], ascending=[False, False, True]).drop('group_min', axis=1).set_index('i')
Out[50]: 
   a   b
i       
0  6   0
0  2   4
0  2   8
3  3   3
3  1   7
3  5  11
1  5   1
1  1   5
1  3   9
2  4   2
2  0   6
2  4  10