pandas 在熊猫的多索引级别内按列排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28371308/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sort by column within multi index level in pandas
提问by Dickster
I have a sorting request per example below.
我在下面的每个示例中都有一个排序请求。
Do i need to reset_index(), then sort() and then set_index() or is there a slick way to do this?
我需要reset_index(),然后sort() 然后set_index() 还是有一个巧妙的方法来做到这一点?
l = [[1,'A',99],[1,'B',102],[1,'C',105],[1,'D',97],[2,'A',19],[2,'B',14],[2,'C',10],[2,'D',17]]
df = pd.DataFrame(l,columns = ['idx1','idx2','col1'])
df.set_index(['idx1','idx2'],inplace=True)
# assume data has been received like this...
print df
col1
idx1 idx2
1 A 99
B 102
C 105
D 97
2 A 19
B 14
C 10
D 17
# I'd like to sort descending on col1, partitioning within index level = 'idx2'
col1
idx1 idx2
1 C 105
B 102
A 99
D 97
2 A 19
D 17
B 14
C 10
Thank you for the answer Note I change the data slightly:
谢谢你的回答注意我稍微改变了数据:
l = [[1,'A',99],[1,'B',11],[1,'C',105],[1,'D',97],[2,'A',19],[2,'B',14],[2,'C',10],[2,'D',17]]
df = pd.DataFrame(l,columns = ['idx1','idx2','col1'])
df.set_index(['idx1','idx2'],inplace=True)
df = df.sort_index(by='col1', ascending=False)
however the output is
但是输出是
idx1 idx2
1 C 105
A 99
D 97
2 A 19
D 17
B 14
1 B 11
2 C 10
i would have wanted it to be
我本来希望它是
idx1 idx2
1 C 105
A 99
D 97
B 11
2 A 19
D 17
B 14
C 10
采纳答案by JAB
you can use sort_index:
你可以使用sort_index:
df.sort_index(by='col1', ascending=False)
This outputs:
这输出:
col1
idx1 idx2
1 C 105
B 102
A 99
D 97
2 A 19
D 17
B 14
C 10
回答by jezrael
You need DataFrame.reset_index, DataFrame.sort_valuesand DataFrame.set_index::
你需要DataFrame.reset_index,DataFrame.sort_values和DataFrame.set_index::
l = [[1,'A',99],[1,'B',11],[1,'C',105],[1,'D',97],
[2,'A',19],[2,'B',14],[2,'C',10],[2,'D',17]]
df = pd.DataFrame(l,columns = ['idx1','idx2','col1'])
df.set_index(['idx1','idx2'],inplace=True)
print (df)
col1
idx1 idx2
1 A 99
B 11
C 105
D 97
2 A 19
B 14
C 10
D 17
df = df.reset_index() \
.sort_values(['idx1','col1'], ascending=[True,False]) \
.set_index(['idx1','idx2'])
print (df)
col1
idx1 idx2
1 C 105
A 99
D 97
B 11
2 A 19
D 17
B 14
C 10
EDIT:
编辑:
For version 0.23.0is possible use columns and index levels together(but buggy now if use ascending=[True, False], so maybe in newer versions):
对于版本0.23.0可以一起使用列和索引级别(但如果使用现在有问题ascending=[True, False],所以可能在较新的版本中):
df = df.sort_values(['idx1','col1'], ascending=[True,False])
print (df)
col1
idx1 idx2
1 C 105
A 99
D 97
B 11
2 A 19
D 17
B 14
C 10
回答by Kyle
This first sorts by the desired column, the resorts on the idx1 MultiIndex level only and works in up to date pandas versions that deprecate the bykwarg.
这首先按所需的列排序,仅在 idx1 MultiIndex 级别上的度假村,并适用于弃用bykwarg 的最新Pandas版本。
df.sort_values('col1', ascending=False).sort_index(level='idx1', sort_remaining=False)
Output:
输出:
col1
idx1 idx2
1 C 105
B 102
A 99
D 97
2 A 19
D 17
B 14
C 10
回答by Ashish Gulati
Another way with a groupby (the already existing indexes) and an apply:
使用 groupby(已经存在的索引)和应用的另一种方式:
df.groupby(level=[0]).apply(lambda x:x.groupby(level=[1]).sum().sort_values('col1',ascending=False))
Output:
输出:
col1
idx1 idx2
1 C 105
B 102
A 99
D 97
2 A 19
D 17
B 14
C 10

