Python 在 Pandas MultiIndex 中添加一个级别

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14744068/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:18:04  来源:igfitidea点击:

Prepend a level to a pandas MultiIndex

pythonpandas

提问by Yawar

I have a DataFrame with a MultiIndex created after some grouping:

我有一个带有 MultiIndex 的 DataFrame 在一些分组后创建:

import numpy as np
import pandas as p
from numpy.random import randn

df = p.DataFrame({
    'A' : ['a1', 'a1', 'a2', 'a3']
  , 'B' : ['b1', 'b2', 'b3', 'b4']
  , 'Vals' : randn(4)
}).groupby(['A', 'B']).sum()

df

Output>            Vals
Output> A  B           
Output> a1 b1 -1.632460
Output>    b2  0.596027
Output> a2 b3 -0.619130
Output> a3 b4 -0.002009

How do I prepend a level to the MultiIndex so that I turn it into something like:

如何在 MultiIndex 前面添加一个级别,以便将其转换为以下内容:

Output>                       Vals
Output> FirstLevel A  B           
Output> Foo        a1 b1 -1.632460
Output>               b2  0.596027
Output>            a2 b3 -0.619130
Output>            a3 b4 -0.002009

采纳答案by okartal

A nice way to do this in one line using pandas.concat():

使用以下方法在一行中执行此操作的好方法pandas.concat()

import pandas as pd

pd.concat([df], keys=['Foo'], names=['Firstlevel'])

This can be generalized to many data frames, see the docs.

这可以推广到许多数据框,请参阅文档

回答by Rutger Kassies

You can first add it as a normal column and then append it to the current index, so:

您可以先将其添加为普通列,然后将其附加到当前索引,因此:

df['Firstlevel'] = 'Foo'
df.set_index('Firstlevel', append=True, inplace=True)

And change the order if needed with:

并根据需要更改顺序:

df.reorder_levels(['Firstlevel', 'A', 'B'])

Which results in:

结果是:

                      Vals
Firstlevel A  B           
Foo        a1 b1  0.871563
              b2  0.494001
           a2 b3 -0.167811
           a3 b4 -1.353409

回答by cxrodgers

I think this is a more general solution:

我认为这是一个更通用的解决方案:

# Convert index to dataframe
old_idx = df.index.to_frame()

# Insert new level at specified location
old_idx.insert(0, 'new_level_name', new_level_values)

# Convert back to MultiIndex
df.index = pandas.MultiIndex.from_frame(old_idx)

Some advantages over the other answers:

与其他答案相比的一些优势:

  • The new level can be added at any location, not just the top.
  • It is purely a manipulation on the index and doesn't require manipulating the data, like the concatenation trick.
  • It doesn't require adding a column as an intermediate step, which can break multi-level column indexes.
  • 可以在任何位置添加新级别,而不仅仅是顶部。
  • 它纯粹是对索引的操作,不需要像连接技巧那样操作数据。
  • 它不需要添加列作为中间步骤,这会破坏多级列索引。

回答by Sam De Meyer

I made a little function out of cxrodgers answer, which IMHO is the best solution since it works purely on an index, independent of any data frame or series.

我从cxrodgers answer 中制作了一个小函数,恕我直言,这是最好的解决方案,因为它完全适用于索引,独立于任何数据框或系列。

There is one fix I added: the to_frame()method will invent new names for index levels that don't have one. As such the new index will have names that don't exist in the old index. I added some code to revert this name-change.

我添加了一个修复:该to_frame()方法将为没有的索引级别发明新名称。因此,新索引将具有旧索引中不存在的名称。我添加了一些代码来恢复这个名称更改。

Below is the code, I've used it myself for a while and it seems to work fine. If you find any issues or edge cases, I'd be much obliged to adjust my answer.

下面是代码,我自己使用了一段时间,它似乎工作正常。如果您发现任何问题或边缘情况,我将不得不调整我的答案。

import pandas as pd

def _handle_insert_loc(loc: int, n: int) -> int:
    """
    Computes the insert index from the right if loc is negative for a given size of n.
    """
    return n + loc + 1 if loc < 0 else loc


def add_index_level(old_index: pd.Index, value: Any, name: str = None, loc: int = 0) -> pd.MultiIndex:
    """
    Expand a (multi)index by adding a level to it.

    :param old_index: The index to expand
    :param name: The name of the new index level
    :param value: Scalar or list-like, the values of the new index level
    :param loc: Where to insert the level in the index, 0 is at the front, negative values count back from the rear end
    :return: A new multi-index with the new level added
    """
    loc = _handle_insert_loc(loc, len(old_index.names))
    old_index_df = old_index.to_frame()
    old_index_df.insert(loc, name, value)
    new_index_names = list(old_index.names)  # sometimes new index level names are invented when converting to a df,
    new_index_names.insert(loc, name)        # here the original names are reconstructed
    new_index = pd.MultiIndex.from_frame(old_index_df, names=new_index_names)
    return new_index

It passed the following unittest code:

它通过了以下单元测试代码:

import unittest

import numpy as np
import pandas as pd

class TestPandaStuff(unittest.TestCase):

    def test_add_index_level(self):
        df = pd.DataFrame(data=np.random.normal(size=(6, 3)))
        i1 = add_index_level(df.index, "foo")

        # it does not invent new index names where there are missing
        self.assertEqual([None, None], i1.names)

        # the new level values are added
        self.assertTrue(np.all(i1.get_level_values(0) == "foo"))
        self.assertTrue(np.all(i1.get_level_values(1) == df.index))

        # it does not invent new index names where there are missing
        i2 = add_index_level(i1, ["x", "y"]*3, name="xy", loc=2)
        i3 = add_index_level(i2, ["a", "b", "c"]*2, name="abc", loc=-1)
        self.assertEqual([None, None, "xy", "abc"], i3.names)

        # the new level values are added
        self.assertTrue(np.all(i3.get_level_values(0) == "foo"))
        self.assertTrue(np.all(i3.get_level_values(1) == df.index))
        self.assertTrue(np.all(i3.get_level_values(2) == ["x", "y"]*3))
        self.assertTrue(np.all(i3.get_level_values(3) == ["a", "b", "c"]*2))

        # df.index = i3
        # print()
        # print(df)