如何在 Pandas 中对数据透视表进行排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45194006/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to sort pivot table in Pandas
提问by Cheng
Here is the code:
这是代码:
test = pd.DataFrame({'country':['us','ca','ru','cn','ru','cn','us','ca','ru','cn','us','ca','ru','cn','us','ca'], 'month':[5,6,7,5,6,7,5,5,6,7,5,6,6,5,5,6], 'id':[x for x in range(16)]})
p = test.pivot_table(index=['month', 'country'], aggfunc='count')[['id']]
The output looks like this:
输出如下所示:
I'd like to sort the table by the id
column, so that the largest number appear on top like:
我想按id
列对表格进行排序,以便最大的数字出现在顶部,例如:
id
month country
us 4
5 cn 2
ca 1
采纳答案by jezrael
You need DataFrame.reset_index
, DataFrame.sort_values
and DataFrame.set_index
:
你需要DataFrame.reset_index
,DataFrame.sort_values
并且DataFrame.set_index
:
p1 = p.reset_index()
.sort_values(['month','id'], ascending=[1,0])
.set_index(['month','country'])
print (p1)
id
month country
5 us 4
cn 2
ca 1
6 ca 3
ru 3
7 cn 2
ru 1
because this solution does not work :(
因为这个解决方案不起作用:(
p1 = p.sort_index(level='month', sort_remaining=True) \
.sort_values('id', ascending=False)
print (p1)
id
month country
5 us 4
6 ca 3
ru 3
5 cn 2
7 cn 2
5 ca 1
7 ru 1
回答by piRSquared
Option 1
This sorts by id
within groups defined by the month
level within the index
选项 1
这id
在由month
索引内的级别定义的组内排序
p.groupby(
level='month', group_keys=False
).apply(pd.DataFrame.sort_values, by='id', ascending=False)
id
month country
5 us 4
cn 2
ca 1
6 ca 3
ru 3
7 cn 2
ru 1
Option 2
This first sorts the entire dataframe by id
then sorts again by the month
level within the index. However, I had to use sort_remaining=False
for self-explanatory reasons and kind='mergesort'
because mergesort
is a stable sort and won't mess with the pre-existing order within groups defined by the 'month' level.
选项 2
这首先对整个数据帧进行id
排序,然后再次按month
索引内的级别排序。但是,sort_remaining=False
出于不言自明的原因,我不得不使用它,kind='mergesort'
因为它mergesort
是一种稳定的排序,不会与“月”级别定义的组内预先存在的顺序混淆。
p.sort_values('id', ascending=False) \
.sort_index(level='month', sort_remaining=False, kind='mergesort')
id
month country
5 us 4
cn 2
ca 1
6 ca 3
ru 3
7 cn 2
ru 1
Option 3
This uses numpy's lexsort
... this works, but I don't like it because it depends on id
being numeric and my being able to put a negative in front of it to get descending ordering. /shrug
选项 3
这使用 numpy 的lexsort
......这有效,但我不喜欢它,因为它取决于id
数字并且我能够在它前面放一个负数以获得降序。/耸肩
p.iloc[np.lexsort([-p.id.values, p.index.get_level_values('month')])]
id
month country
5 us 4
cn 2
ca 1
6 ca 3
ru 3
7 cn 2
ru 1