Python Pandas:多级列名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21443963/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:52:05  来源:igfitidea点击:

Pandas: Multilevel column names

pythonpandas

提问by LondonRob

pandashas support for multi-level column names:

pandas支持多级列名:

>>>  x = pd.DataFrame({'instance':['first','first','first'],'foo':['a','b','c'],'bar':rand(3)})
>>> x = x.set_index(['instance','foo']).transpose()
>>> x.columns
MultiIndex
[(u'first', u'a'), (u'first', u'b'), (u'first', u'c')]
>>> x
instance     first                    
foo              a         b         c
bar       0.102885  0.937838  0.907467

This feature is very useful since it allows multiple versions of the same dataframe to be appended 'horizontally' with the 1st level of the column names (in my example instance) distinguishing the instances.

此功能非常有用,因为它允许将同一数据帧的多个版本“水平”附加到第一级列名(在我的示例中instance)以区分实例。

Imagine I already have a dataframe like this:

想象一下,我已经有一个这样的数据框:

                 a         b         c
bar       0.102885  0.937838  0.907467

Is there a nice way to add another level to the column names, similar to this for row index:

有没有一种很好的方法可以为列名添加另一个级别,类似于行索引:

x['instance'] = 'first'
x.set_level('instance',append=True)

采纳答案by Ian Zurutuza

No need to create a list of tuples

无需创建元组列表

Use: pd.MultiIndex.from_product(iterables)

用: pd.MultiIndex.from_product(iterables)

import pandas as pd
import numpy as np

df = pd.Series(np.random.rand(3), index=["a","b","c"]).to_frame().T
df.columns = pd.Multiindex.from_product([["new_label"], df.columns])

Resultant DataFrame:

结果数据帧:

  new_label                    
          a         b         c
0   0.25999  0.337535  0.333568

Pull request from Jan 25, 2014

2014 年 1 月 25 日的拉取请求

回答by user3377361

Try this:

尝试这个:

df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

columns=[('c','a'),('c','b')]

df.columns=pd.MultiIndex.from_tuples(columns)

回答by Carl

You can use concat. Give it a dictionary of dataframes where the key is the new column level you want to add.

您可以使用concat. 给它一个数据框字典,其中键是您要添加的新列级别。

In [46]: d = {}

In [47]: d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                         data=[[10, 0.89, 0.98, 0.31],
                                               [20, 0.34, 0.78, 0.34]]).set_index('idx')

In [48]: pd.concat(d, axis=1)
Out[48]:
    first_level
              a     b     c
idx
10         0.89  0.98  0.31
20         0.34  0.78  0.34

You can use the same technique to create multiple levels.

您可以使用相同的技术来创建多个级别。

In [49]: d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                          data=[[10, 0.29, 0.63, 0.99],
                                                [20, 0.23, 0.26, 0.98]]).set_index('idx')

In [50]: pd.concat(d, axis=1)
Out[50]:
    first_level             second_level
              a     b     c            a     b     c
idx
10         0.89  0.98  0.31         0.29  0.63  0.99
20         0.34  0.78  0.34         0.23  0.26  0.98

回答by Charl

Here is a function that can help you create the tuple, that can be used by pd.MultiIndex.from_tuples(), a bit more generically. Got the idea from @user3377361.

这是一个可以帮助您创建元组的函数,它可以由 pd.MultiIndex.from_tuples() 使用,更通用一点。从@user3377361 得到这个想法。

def create_tuple_for_for_columns(df_a, multi_level_col):
    """
    Create a columns tuple that can be pandas MultiIndex to create multi level column

    :param df_a: pandas dataframe containing the columns that must form the first level of the multi index
    :param multi_level_col: name of second level column
    :return: tuple containing (second_level_col, firs_level_cols)
    """
    temp_columns = []
    for item in df_a.columns:
        temp_columns.append((multi_level_col, item))
    return temp_columns

It can be used like this:

它可以像这样使用:

df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
columns = create_tuple_for_for_columns(df, 'c')
df.columns = pd.MultiIndex.from_tuples(columns)