如何在 Pandas 中对数据透视表进行排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45194006/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:02:27  来源:igfitidea点击:

How to sort pivot table in Pandas

pythonpandas

提问by Cheng

Here is the code:

这是代码:

test = pd.DataFrame({'country':['us','ca','ru','cn','ru','cn','us','ca','ru','cn','us','ca','ru','cn','us','ca'], 'month':[5,6,7,5,6,7,5,5,6,7,5,6,6,5,5,6], 'id':[x for x in range(16)]})
p = test.pivot_table(index=['month', 'country'], aggfunc='count')[['id']]

The output looks like this:

输出如下所示:

enter image description here

在此处输入图片说明

I'd like to sort the table by the idcolumn, so that the largest number appear on top like:

我想按id列对表格进行排序,以便最大的数字出现在顶部,例如:

                    id
month    country
           us       4
  5        cn       2
           ca       1

采纳答案by jezrael

You need DataFrame.reset_index, DataFrame.sort_valuesand DataFrame.set_index:

你需要DataFrame.reset_indexDataFrame.sort_values并且DataFrame.set_index

p1 = p.reset_index()
      .sort_values(['month','id'], ascending=[1,0])
      .set_index(['month','country'])
print (p1)
               id
month country    
5     us        4
      cn        2
      ca        1
6     ca        3
      ru        3
7     cn        2
      ru        1

because this solution does not work :(

因为这个解决方案不起作用:(

p1 = p.sort_index(level='month', sort_remaining=True) \
      .sort_values('id', ascending=False)
print (p1)
               id
month country    
5     us        4
6     ca        3
      ru        3
5     cn        2
7     cn        2
5     ca        1
7     ru        1

回答by piRSquared

Option 1
This sorts by idwithin groups defined by the monthlevel within the index

选项 1
id在由month索引内的级别定义的组内排序

p.groupby(
    level='month', group_keys=False
).apply(pd.DataFrame.sort_values, by='id', ascending=False)

               id
month country    
5     us        4
      cn        2
      ca        1
6     ca        3
      ru        3
7     cn        2
      ru        1


Option 2
This first sorts the entire dataframe by idthen sorts again by the monthlevel within the index. However, I had to use sort_remaining=Falsefor self-explanatory reasons and kind='mergesort'because mergesortis a stable sort and won't mess with the pre-existing order within groups defined by the 'month' level.

选项 2
这首先对整个数据帧进行id排序,然后再次按month索引内的级别排序。但是,sort_remaining=False出于不言自明的原因,我不得不使用它,kind='mergesort'因为它mergesort是一种稳定的排序,不会与“月”级别定义的组内预先存在的顺序混淆。

p.sort_values('id', ascending=False) \
 .sort_index(level='month', sort_remaining=False, kind='mergesort')

               id
month country    
5     us        4
      cn        2
      ca        1
6     ca        3
      ru        3
7     cn        2
      ru        1


Option 3
This uses numpy's lexsort... this works, but I don't like it because it depends on idbeing numeric and my being able to put a negative in front of it to get descending ordering. /shrug

选项 3
这使用 numpy 的lexsort......这有效,但我不喜欢它,因为它取决于id数字并且我能够在它前面放一个负数以获得降序。/耸肩

p.iloc[np.lexsort([-p.id.values, p.index.get_level_values('month')])]

               id
month country    
5     us        4
      cn        2
      ca        1
6     ca        3
      ru        3
7     cn        2
      ru        1