Python 在 Pandas MultiIndex 中添加一个级别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14744068/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Prepend a level to a pandas MultiIndex
提问by Yawar
I have a DataFrame with a MultiIndex created after some grouping:
我有一个带有 MultiIndex 的 DataFrame 在一些分组后创建:
import numpy as np
import pandas as p
from numpy.random import randn
df = p.DataFrame({
'A' : ['a1', 'a1', 'a2', 'a3']
, 'B' : ['b1', 'b2', 'b3', 'b4']
, 'Vals' : randn(4)
}).groupby(['A', 'B']).sum()
df
Output> Vals
Output> A B
Output> a1 b1 -1.632460
Output> b2 0.596027
Output> a2 b3 -0.619130
Output> a3 b4 -0.002009
How do I prepend a level to the MultiIndex so that I turn it into something like:
如何在 MultiIndex 前面添加一个级别,以便将其转换为以下内容:
Output> Vals
Output> FirstLevel A B
Output> Foo a1 b1 -1.632460
Output> b2 0.596027
Output> a2 b3 -0.619130
Output> a3 b4 -0.002009
采纳答案by okartal
回答by Rutger Kassies
You can first add it as a normal column and then append it to the current index, so:
您可以先将其添加为普通列,然后将其附加到当前索引,因此:
df['Firstlevel'] = 'Foo'
df.set_index('Firstlevel', append=True, inplace=True)
And change the order if needed with:
并根据需要更改顺序:
df.reorder_levels(['Firstlevel', 'A', 'B'])
Which results in:
结果是:
Vals
Firstlevel A B
Foo a1 b1 0.871563
b2 0.494001
a2 b3 -0.167811
a3 b4 -1.353409
回答by cxrodgers
I think this is a more general solution:
我认为这是一个更通用的解决方案:
# Convert index to dataframe
old_idx = df.index.to_frame()
# Insert new level at specified location
old_idx.insert(0, 'new_level_name', new_level_values)
# Convert back to MultiIndex
df.index = pandas.MultiIndex.from_frame(old_idx)
Some advantages over the other answers:
与其他答案相比的一些优势:
- The new level can be added at any location, not just the top.
- It is purely a manipulation on the index and doesn't require manipulating the data, like the concatenation trick.
- It doesn't require adding a column as an intermediate step, which can break multi-level column indexes.
- 可以在任何位置添加新级别,而不仅仅是顶部。
- 它纯粹是对索引的操作,不需要像连接技巧那样操作数据。
- 它不需要添加列作为中间步骤,这会破坏多级列索引。
回答by Sam De Meyer
I made a little function out of cxrodgers answer, which IMHO is the best solution since it works purely on an index, independent of any data frame or series.
我从cxrodgers answer 中制作了一个小函数,恕我直言,这是最好的解决方案,因为它完全适用于索引,独立于任何数据框或系列。
There is one fix I added: the to_frame()method will invent new names for index levels that don't have one. As such the new index will have names that don't exist in the old index. I added some code to revert this name-change.
我添加了一个修复:该to_frame()方法将为没有的索引级别发明新名称。因此,新索引将具有旧索引中不存在的名称。我添加了一些代码来恢复这个名称更改。
Below is the code, I've used it myself for a while and it seems to work fine. If you find any issues or edge cases, I'd be much obliged to adjust my answer.
下面是代码,我自己使用了一段时间,它似乎工作正常。如果您发现任何问题或边缘情况,我将不得不调整我的答案。
import pandas as pd
def _handle_insert_loc(loc: int, n: int) -> int:
"""
Computes the insert index from the right if loc is negative for a given size of n.
"""
return n + loc + 1 if loc < 0 else loc
def add_index_level(old_index: pd.Index, value: Any, name: str = None, loc: int = 0) -> pd.MultiIndex:
"""
Expand a (multi)index by adding a level to it.
:param old_index: The index to expand
:param name: The name of the new index level
:param value: Scalar or list-like, the values of the new index level
:param loc: Where to insert the level in the index, 0 is at the front, negative values count back from the rear end
:return: A new multi-index with the new level added
"""
loc = _handle_insert_loc(loc, len(old_index.names))
old_index_df = old_index.to_frame()
old_index_df.insert(loc, name, value)
new_index_names = list(old_index.names) # sometimes new index level names are invented when converting to a df,
new_index_names.insert(loc, name) # here the original names are reconstructed
new_index = pd.MultiIndex.from_frame(old_index_df, names=new_index_names)
return new_index
It passed the following unittest code:
它通过了以下单元测试代码:
import unittest
import numpy as np
import pandas as pd
class TestPandaStuff(unittest.TestCase):
def test_add_index_level(self):
df = pd.DataFrame(data=np.random.normal(size=(6, 3)))
i1 = add_index_level(df.index, "foo")
# it does not invent new index names where there are missing
self.assertEqual([None, None], i1.names)
# the new level values are added
self.assertTrue(np.all(i1.get_level_values(0) == "foo"))
self.assertTrue(np.all(i1.get_level_values(1) == df.index))
# it does not invent new index names where there are missing
i2 = add_index_level(i1, ["x", "y"]*3, name="xy", loc=2)
i3 = add_index_level(i2, ["a", "b", "c"]*2, name="abc", loc=-1)
self.assertEqual([None, None, "xy", "abc"], i3.names)
# the new level values are added
self.assertTrue(np.all(i3.get_level_values(0) == "foo"))
self.assertTrue(np.all(i3.get_level_values(1) == df.index))
self.assertTrue(np.all(i3.get_level_values(2) == ["x", "y"]*3))
self.assertTrue(np.all(i3.get_level_values(3) == ["a", "b", "c"]*2))
# df.index = i3
# print()
# print(df)

