在 Pandas 数据框中创建多索引列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35760223/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Make multiindex columns in a pandas dataframe
提问by John
I have a pandas dataframe with the following strcuture:
我有一个具有以下结构的Pandas数据框:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(32).reshape((4,8)),
index = pd.date_range('2016-01-01', periods=4),
columns=['male ; 0', 'male ; 1','male ; 2','male ; 4','female ; 0','female ; 1','female ; 2','female ; 3',])
The column names are messy with a combination of two variable in the header name, and residual punctuation from the original spreadsheet.
列名称混乱,标题名称中包含两个变量,以及原始电子表格中的残留标点符号。
What I want to do is set a column MultiIndex called sex and age in my dataframe.
我想要做的是在我的数据框中设置一个名为 sex 和 age 的列 MultiIndex 。
I tried using pd.MultiIndex.from_tuples
like this:
我试过这样使用pd.MultiIndex.from_tuples
:
columns = [('Male', 0),('Male', 1),('Male', 2),('Male', 3),('Female', 0),('Female', 1),('Female', 2),('Female', 3)]
df.columns = pd.MultiIndex.from_tuples(columns)
And then naming the column indexes:
然后命名列索引:
df.columns.names = ['Sex', 'Age']
This gives the result that I would like. However, my dataframes has ages to over 100 for each sex so this is not very practical.
这给出了我想要的结果。但是,我的数据帧对于每个性别的年龄都超过 100,所以这不是很实用。
Could someone please guide me on how to set MultiIndex columns from a tuple programatically.
有人可以指导我如何以编程方式从元组设置 MultiIndex 列。
回答by Def_Os
Jaco's answer works nicely, but you can even create a MultiIndex
from a product directlyusing .from_product()
:
Jaco 的答案效果很好,但您甚至可以直接使用以下方法MultiIndex
从产品创建一个:.from_product()
sex = ['Male', 'Female']
age = range(100)
df.columns = pd.MultiIndex.from_product([sex, age], names=['Sex', 'Age'])
回答by Alex
You can use the itertools
module to generate your columns
variable by taking the cartesian join of gender and the age range in your data, for example:
您可以使用该itertools
模块columns
通过对数据中的性别和年龄范围进行笛卡尔连接来生成变量,例如:
import itertools
max_age = 100
sex = ['Male','Female']
age = range(max_age)
columns=list(itertools.product(sex, age))
df.columns = pd.MultiIndex.from_tuples(columns)
df.columns.names = ['Sex', 'Age']