Pandas 按值排序,然后按索引排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33699555/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas sorting by value and then by index
提问by sparc_spread
I have the following dataset:
我有以下数据集:
import numpy as np
from pandas import DataFrame
import numpy.random as random
random.seed(12)
df = DataFrame (
{
"fac1" : ["a","a","a","a","b","b","b","b"] ,
"val" : random.choice(np.arange(0,20), 8, replace=False)
}
)
df2 = df.set_index(["fac1"])
df2
What I want is to sort by val
within each fac1
group, to produce this:
我想要的是val
在每个fac1
组内排序,以产生这个:
I have combed the documentation and cannot find a straightforward way. The best I could do was the following hack:
我梳理了文档,找不到简单的方法。我能做的最好的是以下黑客:
df3 = df2.reset_index()
df4 = df3.sort_values(["fac1","val"],ascending=[True,True],axis=0)
df5 = df4.set_index(["fac1"])
df5
# Produces the picture above
(I realize the above could benefit from multiple inplace
options, just doing it this way to make intermediate products clear).
(我意识到以上可以从多种inplace
选择中受益,只是这样做可以使中间产品清晰)。
I did find this SO post, which uses grouping and a sorting function. However the following code, adapted from that post, produced an incorrect result:
我确实找到了这篇 SO post,它使用了分组和排序功能。但是,改编自该帖子的以下代码产生了不正确的结果:
df2.groupby("fac1",axis=1).apply(lambda x : x.sort_values("val"))
(Output removed for space considerations)
(出于空间考虑删除了输出)
Is there another way to approach this?
有没有另一种方法来解决这个问题?
Update: Solution
更新:解决方案
The accepted solution is:
接受的解决方案是:
df2.sort_values(by='val').sort_index(kind='mergesort')
The sorting algorithm must be mergesort
and it must be explicitly specified as it is not the default. As the sort_index
documentationpoints out, "mergesort is the only stablealgorithm." Here's another sample dataset that will not sort properly if you don't specify mergesort
for kind
:
排序算法必须是mergesort
并且必须明确指定,因为它不是默认值。由于该sort_index
文件指出,“归并是唯一稳定的算法。” 这是另一个示例数据集,如果您不指定mergesort
for ,它将无法正确排序kind
:
random.seed(12)
len = 32
df = DataFrame (
{
"fac1" : ["a" for i in range(int(len/2))] + ["b" for i in range(int(len/2))] ,
"val" : random.choice(np.arange(0,100), len, replace=False)
}
)
df2 = df.set_index(["fac1"])
df2.sort_values(by='val').sort_index()
(Am omitting all outputs for space consideration)
(出于空间考虑,我省略了所有输出)
回答by Sam
EDIT: I looked into the documentation and the default sorting algorithm for sort_index is quicksort. This is NOT a "stable" algorithm, in that it does not preserve "the input order of equal elements in the sorted output" (from Wikipedia). However, sort_index gives you the option to choose "mergesort", which IS a stable sorting algorithm. So the fact that my original answer,
编辑:我查看了文档,sort_index 的默认排序算法是快速排序。这不是“稳定”算法,因为它不保留“排序输出中相等元素的输入顺序”(来自维基百科)。但是,sort_index 为您提供了选择“mergesort”的选项,这是一种稳定的排序算法。所以我原来的答案,
df2.sort_values(by='val').sort_index()
, worked, was simply happenstance. This code should work every time, since it uses a stable sorting algorithm:
,工作,只是偶然。这段代码应该每次都有效,因为它使用了稳定的排序算法:
df2.sort_values(by='val').sort_index(kind = 'mergesort')