Pandas 按值排序，然后按索引排序

Question

提问by sparc_spread

I have the following dataset:

我有以下数据集：

import numpy as np
from pandas import DataFrame
import numpy.random as random

random.seed(12)

df = DataFrame (
    {
        "fac1" : ["a","a","a","a","b","b","b","b"] ,
        "val" : random.choice(np.arange(0,20), 8, replace=False)
    }
)
df2 = df.set_index(["fac1"])
df2

What I want is to sort by valwithin each fac1group, to produce this:

我想要的是val在每个fac1组内排序，以产生这个：

I have combed the documentation and cannot find a straightforward way. The best I could do was the following hack:

我梳理了文档，找不到简单的方法。我能做的最好的是以下黑客：

df3 = df2.reset_index()
df4 = df3.sort_values(["fac1","val"],ascending=[True,True],axis=0)
df5 = df4.set_index(["fac1"])
df5
# Produces the picture above

(I realize the above could benefit from multiple inplaceoptions, just doing it this way to make intermediate products clear).

（我意识到以上可以从多种inplace选择中受益，只是这样做可以使中间产品清晰）。

I did find this SO post, which uses grouping and a sorting function. However the following code, adapted from that post, produced an incorrect result:

我确实找到了这篇 SO post，它使用了分组和排序功能。但是，改编自该帖子的以下代码产生了不正确的结果：

df2.groupby("fac1",axis=1).apply(lambda x : x.sort_values("val"))

(Output removed for space considerations)

（出于空间考虑删除了输出）

Is there another way to approach this?

有没有另一种方法来解决这个问题？

Update: Solution

更新：解决方案

The accepted solution is:

接受的解决方案是：

df2.sort_values(by='val').sort_index(kind='mergesort')

The sorting algorithm must be mergesortand it must be explicitly specified as it is not the default. As the sort_indexdocumentationpoints out, "mergesort is the only stablealgorithm." Here's another sample dataset that will not sort properly if you don't specify mergesortfor kind:

排序算法必须是mergesort并且必须明确指定，因为它不是默认值。由于该sort_index文件指出，“归并是唯一稳定的算法。” 这是另一个示例数据集，如果您不指定mergesortfor ，它将无法正确排序kind：

random.seed(12)

len = 32 

df = DataFrame (
    {
        "fac1" : ["a" for i in range(int(len/2))] + ["b" for i in range(int(len/2))] ,
        "val" : random.choice(np.arange(0,100), len, replace=False)
    }
)
df2 = df.set_index(["fac1"])
df2.sort_values(by='val').sort_index()

(Am omitting all outputs for space consideration)

（出于空间考虑，我省略了所有输出）

Answer 1

回答by Sam

EDIT: I looked into the documentation and the default sorting algorithm for sort_index is quicksort. This is NOT a "stable" algorithm, in that it does not preserve "the input order of equal elements in the sorted output" (from Wikipedia). However, sort_index gives you the option to choose "mergesort", which IS a stable sorting algorithm. So the fact that my original answer,

编辑：我查看了文档，sort_index 的默认排序算法是快速排序。这不是“稳定”算法，因为它不保留“排序输出中相等元素的输入顺序”（来自维基百科）。但是，sort_index 为您提供了选择“mergesort”的选项，这是一种稳定的排序算法。所以我原来的答案，

df2.sort_values(by='val').sort_index()

, worked, was simply happenstance. This code should work every time, since it uses a stable sorting algorithm:

，工作，只是偶然。这段代码应该每次都有效，因为它使用了稳定的排序算法：

df2.sort_values(by='val').sort_index(kind = 'mergesort')

Pandas 按值排序，然后按索引排序

提问by sparc_spread

Update: Solution

更新：解决方案

回答by Sam

相关推荐

最近更新

标签

Pandas 按值排序，然后按索引排序

提问by sparc_spread

Update: Solution

更新：解决方案

回答by Sam

相关推荐

pandas 如何更改python数据框中的标题行

pandas 基于 DataFrame 列名称的颜色 seaborn boxplot

pandas 访问熊猫系列的索引

pandas 多处理写入熊猫数据帧

相关推荐

最近更新

标签