在 Pandas 中将 MultiIndex 的级别重新索引为任意顺序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19037159/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:11:52  来源:igfitidea点击:

Reindexing a level of a MultiIndex to arbitrary order in Pandas

pythonpandas

提问by Chris Fonnesbeck

I have some code that summarizes a DataFrame containing the famous Titanic dataset as follows:

我有一些代码总结了一个包含著名的泰坦尼克号数据集的 DataFrame 如下:

titanic['agecat'] = pd.cut(titanic.age, [0, 13, 20, 64, 100], 
               labels=['child', 'adolescent', 'adult', 'senior'])
titanic.groupby(['agecat', 'pclass','sex']
                )['survived'].mean()

This produces the following DataFrame with a MultiIndex based on the groupbycall:

这会根据groupby调用生成以下带有 MultiIndex 的 DataFrame :

agecat      pclass  sex   
adolescent  1       female    1.000000
                    male      0.200000
            2       female    0.923077
                    male      0.117647
            3       female    0.542857
                    male      0.125000
adult       1       female    0.965517
                    male      0.343284
            2       female    0.868421
                    male      0.078125
            3       female    0.441860
                    male      0.159184
child       1       female    0.000000
                    male      1.000000
            2       female    1.000000
                    male      1.000000
            3       female    0.483871
                    male      0.324324
senior      1       female    1.000000
                    male      0.142857
            2       male      0.000000
            3       male      0.000000
Name: survived, dtype: float64

However, I want the agecatlevel of the MultiIndex to be naturally ordered, rather than alphabetical ordered, that is: ['child', 'adolescent', 'adult', 'senior']. However, if I try using reindexto do this:

但是,我希望agecatMultiIndex的级别是自然排序的,而不是按字母顺序排序,即:['child', 'adolescent', 'adult', 'senior']。但是,如果我尝试reindex这样做:

titanic.groupby(['agecat', 'pclass','sex'])['survived'].mean().reindex(
    ['child', 'adolescent', 'adult', 'senior'], level='agecat')

it does not have any effect on the resulting DataFrame's MultiIndex. Should this be working, or am I using the wrong approach?

它对生成的 DataFrame 的 MultiIndex 没有任何影响。这应该有效,还是我使用了错误的方法?

回答by Jeff

You need to provide an MultiIndex that reorders

您需要提供一个重新排序的 MultiIndex

In [36]: index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
                                   ['one', 'two', 'three']],
                           labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
                                   [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
                           names=['first', 'second'])

In [37]: df = DataFrame(np.random.randn(10, 3), index=index,
                               columns=Index(['A', 'B', 'C'], name='exp'))

In [38]: df
Out[38]: 
exp                  A         B         C
first second                              
foo   one    -1.007742  2.594146  1.211697
      two     1.280218  0.799940  0.039380
      three  -0.501615 -0.136437  0.997753
bar   one    -0.201222  0.060552  0.480552
      two    -0.758227  0.457597 -0.648014
baz   two    -0.326620  1.046366 -2.047380
      three   0.395894  1.128850 -1.126649
qux   one    -0.353886 -1.200079  0.493888
      two    -0.124532  0.114733  1.991793
      three  -1.042094  1.079344 -0.153037

Simulate the reordering by doing a sort on the second level

通过在第二级进行排序来模拟重新排序

In [39]: idx = df.sortlevel(level='second').index

In [40]: idx
Out[40]: 
MultiIndex
[(u'foo', u'one'), (u'bar', u'one'), (u'qux', u'one'), (u'foo', u'two'), (u'bar', u'two'), (u'baz', u'two'), (u'qux', u'two'), (u'foo', u'three'), (u'baz', u'three'), (u'qux', u'three')]

In [41]: df.reindex(idx)
Out[41]: 
exp                  A         B         C
first second                              
foo   one    -1.007742  2.594146  1.211697
bar   one    -0.201222  0.060552  0.480552
qux   one    -0.353886 -1.200079  0.493888
foo   two     1.280218  0.799940  0.039380
bar   two    -0.758227  0.457597 -0.648014
baz   two    -0.326620  1.046366 -2.047380
qux   two    -0.124532  0.114733  1.991793
foo   three  -0.501615 -0.136437  0.997753
baz   three   0.395894  1.128850 -1.126649
qux   three  -1.042094  1.079344 -0.153037

A different ordering

不同的排序

In [42]: idx = idx[5:] + idx[:5]

In [43]: idx
Out[43]: 
MultiIndex
[(u'bar', u'one'), (u'bar', u'two'), (u'baz', u'three'), (u'baz', u'two'), (u'foo', u'one'), (u'foo', u'three'), (u'foo', u'two'), (u'qux', u'one'), (u'qux', u'three'), (u'qux', u'two')]

In [44]: df.reindex(idx)
Out[44]: 
exp                  A         B         C
first second                              
bar   one    -0.201222  0.060552  0.480552
      two    -0.758227  0.457597 -0.648014
baz   three   0.395894  1.128850 -1.126649
      two    -0.326620  1.046366 -2.047380
foo   one    -1.007742  2.594146  1.211697
      three  -0.501615 -0.136437  0.997753
      two     1.280218  0.799940  0.039380
qux   one    -0.353886 -1.200079  0.493888
      three  -1.042094  1.079344 -0.153037
      two    -0.124532  0.114733  1.991793