Python Pandas:修改特定级别的 Multiindex

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29150346/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:08:25  来源:igfitidea点击:

Pandas: Modify a particular level of Multiindex

pythonpandasimmutabilitymulti-index

提问by

I have a dataframe with Multiindex and would like to modify one particular level of the Multiindex. For instance, the first level might be strings and I may want to remove the white spaces from that index level:

我有一个带有 Multiindex 的数据框,想修改 Multiindex 的一个特定级别。例如,第一级可能是字符串,我可能想从该索引级中删除空格:

df.index.levels[1] = [x.replace(' ', '') for x in df.index.levels[1]]

However, the code above results in an error:

但是,上面的代码导致错误:

TypeError: 'FrozenList' does not support mutable operations.

I know I can reset_index and modify the column and then re-create the Multiindex, but I wonder whether there is a more elegant way to modify one particular level of the Multiindex directly.

我知道我可以重置索引并修改列,然后重新创建多索引,但我想知道是否有更优雅的方法来直接修改多索引的一个特定级别。

回答by Shovalt

As mentioned in the comments, indexes are immutable and must be remade when modifying, but you do not have to use reset_indexfor that, you can create a new multi-index directly:

正如评论中提到的,索引是不可变的,修改时必须重新制作,但您不必为此使用reset_index,您可以直接创建一个新的多索引:

df.index = pd.MultiIndex.from_tuples([(x[0], x[1].replace(' ', ''), x[2]) for x in df.index])

This example is for a 3-level index, where you want to modify the middle level. You need to change the size of the tuple for different level sizes.

此示例针对 3 级索引,您要在其中修改中间级别。您需要针对不同级别的大小更改元组的大小。

回答by John

Thanks to @cxrodgers's comment, I think the fastest way to do this is:

感谢@cxrodgers 的评论,我认为最快的方法是:

df.index = df.index.set_levels(df.index.levels[0].str.replace(' ', ''), level=0)


Old, longer answer:

旧的,更长的答案:

I found that the list comprehension suggested by @Shovalt works but felt slow on my machine (using a dataframe with >10,000 rows).

我发现@Shovalt 建议的列表理解有效,但在我的机器上感觉很慢(使用 >10,000 行的数据框)。

Instead, I was able to use .set_levelsmethod, which was quite a bit faster for me.

相反,我能够使用.set_levels方法,这对我来说要快得多。

%timeit pd.MultiIndex.from_tuples([(x[0].replace(' ',''), x[1]) for x in df.index])
1 loop, best of 3: 394 ms per loop

%timeit df.index.set_levels(df.index.get_level_values(0).str.replace(' ',''), level=0)
10 loops, best of 3: 134 ms per loop

In actuality, I just needed to prepend some text. This was even faster with .set_levels:

实际上,我只需要预先添加一些文本。这甚至更快.set_levels

%timeit pd.MultiIndex.from_tuples([('00'+x[0], x[1]) for x in df.index])
100 loops, best of 3: 5.18 ms per loop

%timeit df.index.set_levels('00'+df.index.get_level_values(0), level=0)
1000 loops, best of 3: 1.38 ms per loop

%timeit df.index.set_levels('00'+df.index.levels[0], level=0)
1000 loops, best of 3: 331 μs per loop

This solution is based on the answer in the link from the comment by @denfromufa ...

此解决方案基于@denfromufa 评论中链接中的答案...

python - Multiindex and timezone - Frozen list error - Stack Overflow

python - 多索引和时区 - 冻结列表错误 - VoidCC

回答by normanius

The answers provided are correct. Depending on the structure of the multi-index, it can be considerably faster to apply a map directly on the levels instead of constructing a new multi-index.

提供的答案是正确的。根据多索引的结构,直接在级别上应用地图而不是构建新的多索引会快得多。

I use the following function to modify a particular index level. It works also on single-level indices.

我使用以下函数来修改特定的索引级别。它也适用于单级索引。

def map_index_level(index, mapper, level=0):
    """
    Returns a new Index or MultiIndex, with the level values being mapped.
    """
    assert(isinstance(index, pd.Index))
    if isinstance(index, pd.MultiIndex):
        new_level = index.levels[level].map(mapper)
        new_index = index.set_levels(new_level, level=level)
    else:
        # Single level index.
        assert(level==0)
        new_index = index.map(mapper)
    return new_index

Usage:

用法:

df = pd.DataFrame([[1,2],[3,4]])
df.index = pd.MultiIndex.from_product([["a"],["i","ii"]])
df.columns = ["x","y"]

df.index = map_index_level(index=df.index, mapper=str.upper, level=1)
df.columns = map_index_level(index=df.columns, mapper={"x":"foo", "y":"bar"})

# Result:
#       foo  bar
# a I     1    2
#   II    3    4