在 Pandas 中将 MultiIndex 的级别重新索引为任意顺序

Question

提问by Chris Fonnesbeck

I have some code that summarizes a DataFrame containing the famous Titanic dataset as follows:

我有一些代码总结了一个包含著名的泰坦尼克号数据集的 DataFrame 如下：

titanic['agecat'] = pd.cut(titanic.age, [0, 13, 20, 64, 100], 
               labels=['child', 'adolescent', 'adult', 'senior'])
titanic.groupby(['agecat', 'pclass','sex']
                )['survived'].mean()

This produces the following DataFrame with a MultiIndex based on the groupbycall:

这会根据groupby调用生成以下带有 MultiIndex 的 DataFrame ：

agecat      pclass  sex   
adolescent  1       female    1.000000
                    male      0.200000
            2       female    0.923077
                    male      0.117647
            3       female    0.542857
                    male      0.125000
adult       1       female    0.965517
                    male      0.343284
            2       female    0.868421
                    male      0.078125
            3       female    0.441860
                    male      0.159184
child       1       female    0.000000
                    male      1.000000
            2       female    1.000000
                    male      1.000000
            3       female    0.483871
                    male      0.324324
senior      1       female    1.000000
                    male      0.142857
            2       male      0.000000
            3       male      0.000000
Name: survived, dtype: float64

However, I want the agecatlevel of the MultiIndex to be naturally ordered, rather than alphabetical ordered, that is: ['child', 'adolescent', 'adult', 'senior']. However, if I try using reindexto do this:

但是，我希望agecatMultiIndex的级别是自然排序的，而不是按字母顺序排序，即：['child', 'adolescent', 'adult', 'senior']。但是，如果我尝试reindex这样做：

titanic.groupby(['agecat', 'pclass','sex'])['survived'].mean().reindex(
    ['child', 'adolescent', 'adult', 'senior'], level='agecat')

it does not have any effect on the resulting DataFrame's MultiIndex. Should this be working, or am I using the wrong approach?

它对生成的 DataFrame 的 MultiIndex 没有任何影响。这应该有效，还是我使用了错误的方法？

Answer 1

回答by Jeff

You need to provide an MultiIndex that reorders

您需要提供一个重新排序的 MultiIndex

In [36]: index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
                                   ['one', 'two', 'three']],
                           labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
                                   [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
                           names=['first', 'second'])

In [37]: df = DataFrame(np.random.randn(10, 3), index=index,
                               columns=Index(['A', 'B', 'C'], name='exp'))

In [38]: df
Out[38]: 
exp                  A         B         C
first second                              
foo   one    -1.007742  2.594146  1.211697
      two     1.280218  0.799940  0.039380
      three  -0.501615 -0.136437  0.997753
bar   one    -0.201222  0.060552  0.480552
      two    -0.758227  0.457597 -0.648014
baz   two    -0.326620  1.046366 -2.047380
      three   0.395894  1.128850 -1.126649
qux   one    -0.353886 -1.200079  0.493888
      two    -0.124532  0.114733  1.991793
      three  -1.042094  1.079344 -0.153037

Simulate the reordering by doing a sort on the second level

通过在第二级进行排序来模拟重新排序

In [39]: idx = df.sortlevel(level='second').index

In [40]: idx
Out[40]: 
MultiIndex
[(u'foo', u'one'), (u'bar', u'one'), (u'qux', u'one'), (u'foo', u'two'), (u'bar', u'two'), (u'baz', u'two'), (u'qux', u'two'), (u'foo', u'three'), (u'baz', u'three'), (u'qux', u'three')]

In [41]: df.reindex(idx)
Out[41]: 
exp                  A         B         C
first second                              
foo   one    -1.007742  2.594146  1.211697
bar   one    -0.201222  0.060552  0.480552
qux   one    -0.353886 -1.200079  0.493888
foo   two     1.280218  0.799940  0.039380
bar   two    -0.758227  0.457597 -0.648014
baz   two    -0.326620  1.046366 -2.047380
qux   two    -0.124532  0.114733  1.991793
foo   three  -0.501615 -0.136437  0.997753
baz   three   0.395894  1.128850 -1.126649
qux   three  -1.042094  1.079344 -0.153037

A different ordering

不同的排序

In [42]: idx = idx[5:] + idx[:5]

In [43]: idx
Out[43]: 
MultiIndex
[(u'bar', u'one'), (u'bar', u'two'), (u'baz', u'three'), (u'baz', u'two'), (u'foo', u'one'), (u'foo', u'three'), (u'foo', u'two'), (u'qux', u'one'), (u'qux', u'three'), (u'qux', u'two')]

In [44]: df.reindex(idx)
Out[44]: 
exp                  A         B         C
first second                              
bar   one    -0.201222  0.060552  0.480552
      two    -0.758227  0.457597 -0.648014
baz   three   0.395894  1.128850 -1.126649
      two    -0.326620  1.046366 -2.047380
foo   one    -1.007742  2.594146  1.211697
      three  -0.501615 -0.136437  0.997753
      two     1.280218  0.799940  0.039380
qux   one    -0.353886 -1.200079  0.493888
      three  -1.042094  1.079344 -0.153037
      two    -0.124532  0.114733  1.991793

在 Pandas 中将 MultiIndex 的级别重新索引为任意顺序

提问by Chris Fonnesbeck

回答by Jeff

相关推荐

最近更新

标签

在 Pandas 中将 MultiIndex 的级别重新索引为任意顺序

提问by Chris Fonnesbeck

回答by Jeff

相关推荐

将列作为副本添加到 Pandas DataFrame

如何将 Pandas Dataframe 偏移/转移到另一年？

Pandas stack/groupby 创建一个新的数据帧

pandas 如何更新熊猫中的现有数据框？

相关推荐

最近更新

标签