按多列和重复索引对 Pandas DataFrame 进行排序

Question

提问by user2034412

I have a pandas DataFrame with duplicated indices. There are 3 rows with each index, and they correspond to a group of items. There are two columns, aand b.

我有一个带有重复索引的 Pandas DataFrame。每个索引有 3 行，它们对应一组项目。有两列，a和b。

df = pandas.DataFrame([{'i': b % 4, 'a': abs(b - 6) , 'b': b}
                       for b in range(12)]).set_index('i')

I want to sort the DataFrame so that:

我想对 DataFrame 进行排序，以便：

All of the rows with the same indices are adjacent. (all of the groups are together)
The groups are in reverse order by the lowest value of awithin the group.

具有相同索引的所有行都是相邻的。（所有组都在一起）
组按a组内最低值的相反顺序排列。

For example, in the above df, the first three items should be the ones with index 0, because the lowest avalue for those three rows is 2, and all of the other groups have at least one row with an avalue lower than 2. The second three items could be either group 3 or group 1, because the lowest avalue in both of those groups is 1. The last group of items should be group 2, because it has a row with an avalue of 0.

例如，在上面的中df，前三项应该是有 index 的项0，因为a这三行的最低值是 2，而其他所有组至少有一行的a值小于 2。后三项项目可以是第 3 组或第 1 组，因为这a两个组中的最小值都是 1。最后一组项目应该是第 2 组，因为它有一行a值为 0。

Within each group, the items are sorted in ascending order by b.

在每个组中，项目按升序排序b。

Desired output:

期望的输出：

I've been trying something like:

我一直在尝试类似的东西：

df.groupby('i')[['a']].transform(min).sort(['a', 'b'], ascending=[0, 1])

But it gives me a KeyError, and it only gets that far if I make ia column instead of an index anyway.

但它给了我一个 KeyError，而且只有当我创建i一个列而不是一个索引时它才会那么远。

Answer 1

回答by Alexander

You can first sort by ain descending order and then sort your index:

您可以先按a降序排序，然后对索引进行排序：

>>> df.sort(['a', 'b'], ascending=[False, True]).sort_index()
   a   b
i       
0  6   0
0  2   4
0  2   8
1  5   1
1  3   9
1  1   5
2  4   2
2  4  10
2  0   6
3  5  11
3  3   3
3  1   7

Answer 2

回答by chrisb

The most straightforward way I see is moving your index to a column, and calculating a new column with the group min.

我看到的最直接的方法是将您的索引移动到一列，并使用组 min 计算一个新列。

In [43]: df = df.reset_index()

In [45]: df['group_min'] = df.groupby('i')['a'].transform('min')

Then you can sort by your conditions:

然后你可以按你的条件排序：

In [49]: df.sort_values(['group_min', 'i', 'b'], ascending=[False, False, True])
Out[49]: 
    i  a   b  group_min
0   0  6   0          2
4   0  2   4          2
8   0  2   8          2
3   3  3   3          1
7   3  1   7          1
11  3  5  11          1
1   1  5   1          1
5   1  1   5          1
9   1  3   9          1
2   2  4   2          0
6   2  0   6          0
10  2  4  10          0

To get back to your desired frame, drop the tracking variable and reset the index.

要返回所需的帧，请删除跟踪变量并重置索引。

In [50]: df.sort_values(['group_min', 'i', 'b'], ascending=[False, False, True]).drop('group_min', axis=1).set_index('i')
Out[50]: 
   a   b
i       
0  6   0
0  2   4
0  2   8
3  3   3
3  1   7
3  5  11
1  5   1
1  1   5
1  3   9
2  4   2
2  0   6
2  4  10

按多列和重复索引对 Pandas DataFrame 进行排序

提问by user2034412

回答by Alexander

回答by chrisb

相关推荐

最近更新

标签

按多列和重复索引对 Pandas DataFrame 进行排序

提问by user2034412

回答by Alexander

回答by chrisb

相关推荐

使用 bokeh 或 matplotlib 来自 Pandas DataFrame 的分层饼图/甜甜圈图

pandas 在 pylab.plot 中使用数据框列名作为标签

从 Pandas DataFrame 返回单个单元格值

从 Pandas 数据框中删除重复项并保留原始数据

相关推荐

最近更新

标签