Python Pandas:从多级列索引中删除一个级别?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22233488/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:32:07  来源:igfitidea点击:

Pandas: drop a level from a multi-level column index?

pythonpandas

提问by David Wolever

If I've got a multi-level column index:

如果我有一个多级列索引:

>>> cols = pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")])
>>> pd.DataFrame([[1,2], [3,4]], columns=cols)
    a
   ---+--
    b | c
--+---+--
0 | 1 | 2
1 | 3 | 4

How can I drop the "a" level of that index, so I end up with:

如何删除该索引的“a”级别,所以我最终得到:

    b | c
--+---+--
0 | 1 | 2
1 | 3 | 4

回答by DSM

You can use MultiIndex.droplevel:

您可以使用MultiIndex.droplevel

>>> cols = pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")])
>>> df = pd.DataFrame([[1,2], [3,4]], columns=cols)
>>> df
   a   
   b  c
0  1  2
1  3  4

[2 rows x 2 columns]
>>> df.columns = df.columns.droplevel()
>>> df
   b  c
0  1  2
1  3  4

[2 rows x 2 columns]

回答by sedeh

You could also achieve that by renaming the columns:

您还可以通过重命名列来实现:

df.columns = ['a', 'b']

df.columns = ['a', 'b']

This involves a manual step but could be an option especially if you would eventually rename your data frame.

这涉及一个手动步骤,但可能是一个选项,特别是如果您最终要重命名您的数据框。

回答by spacetyper

Another way to do this is to reassign dfbased on a cross section of df, using the .xsmethod.

另一种方法是使用.xs方法df基于 的横截面重新分配。df

>>> df

    a
    b   c
0   1   2
1   3   4

>>> df = df.xs('a', axis=1, drop_level=True)

    # 'a' : key on which to get cross section
    # axis=1 : get cross section of column
    # drop_level=True : returns cross section without the multilevel index

>>> df

    b   c
0   1   2
1   3   4

回答by Mint

Another way to drop the index is to use a list comprehension:

另一种删除索引的方法是使用列表理解:

df.columns = [col[1] for col in df.columns]

   b  c
0  1  2
1  3  4

This strategy is also useful if you want to combine the names from both levels like in the example below where the bottom level contains two 'y's:

如果您想将两个级别的名称组合在一起,则此策略也很有用,如下例所示,其中底层包含两个 'y':

cols = pd.MultiIndex.from_tuples([("A", "x"), ("A", "y"), ("B", "y")])
df = pd.DataFrame([[1,2, 8 ], [3,4, 9]], columns=cols)

   A     B
   x  y  y
0  1  2  8
1  3  4  9

Dropping the top level would leave two columns with the index 'y'. That can be avoided by joining the names with the list comprehension.

删除顶层会留下两列索引为“y”的列。这可以通过将名称与列表推导结合来避免。

df.columns = ['_'.join(col) for col in df.columns]

    A_x A_y B_y
0   1   2   8
1   3   4   9

That's a problem I had after doing a groupby and it took a while to find this other questionthat solved it. I adapted that solution to the specific case here.

这是我在进行 groupby 后遇到的一个问题,我花了一段时间才找到解决它的另一个问题。我在此处针对特定情况调整了该解决方案。

回答by dhFrank

I have struggled with this problem since I don't know why my droplevel() function does not work. Work through several and learn that ‘a' in your table is columns name and ‘b', ‘c' are index. Do like this will help

我一直在努力解决这个问题,因为我不知道为什么我的 droplevel() 函数不起作用。通过几个工作并了解表中的“a”是列名,而“b”、“c”是索引。这样做会有所帮助

df.columns.name = None
df.reset_index() #make index become label

回答by YOBEN_S

A small trick using sumwith level=1(work when level=1 is all unique)

使用sumlevel=1 的一个小技巧(当 level=1 都是唯一的时工作)

df.sum(level=1,axis=1)
Out[202]: 
   b  c
0  1  2
1  3  4


More common solution get_level_values

更常见的解决方案 get_level_values

df.columns=df.columns.get_level_values(1)
df
Out[206]: 
   b  c
0  1  2
1  3  4

回答by jxc

As of Pandas 0.24.0, we can now use DataFrame.droplevel():

从 Pandas 0.24.0 开始,我们现在可以使用DataFrame.droplevel()

cols = pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")])
df = pd.DataFrame([[1,2], [3,4]], columns=cols)

df.droplevel(0, axis=1) 

#   b  c
#0  1  2
#1  3  4

This is very useful if you want to keep your DataFrame method-chain rolling.

如果您想保持 DataFrame 方法链滚动,这非常有用。

回答by Shubham Joshi

One line super simple answer:- df.columns=[df.columns.get_level_values(0)[i] + '_' + df.columns.get_level_values(1)[i] for i in range(0,len(df.columns.get_level_values(0)))]

一行超级简单的答案:- df.columns=[df.columns.get_level_values(0)[i] + '_' + df.columns.get_level_values(1)[i] for i in range(0,len(df. columns.get_level_values(0)))]

this will give you a data frame with:-

这将为您提供一个数据框:-

a_b b_c 0 1 2 1 3 4

a_b b_c 0 1 2 1 3 4