pandas 使用分层列创建 DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/17985159/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating DataFrame with Hierarchical Columns
提问by Alex Rothberg
What is the easiest way to create a DataFramewith hierarchical columns?
创建DataFrame具有分层列的最简单方法是什么?
I am currently creating a DataFrame from a dict of names -> Seriesusing:
我目前正在从名称字典创建一个 DataFrame ->Series使用:
df = pd.DataFrame(data=serieses)
df = pd.DataFrame(data=serieses)
I would like to use the same columns names but add an additional level of hierarchy on the columns. For the time being I want the additional level to have the same value for columns, let's say "Estimates".
我想使用相同的列名称,但在列上添加额外的层次结构。目前,我希望附加级别对列具有相同的值,让我们说“估计”。
I am trying the following but that does not seem to work:
我正在尝试以下操作,但这似乎不起作用:
pd.DataFrame(data=serieses,columns=pd.MultiIndex.from_tuples([(x, "Estimates") for x in serieses.keys()]))
pd.DataFrame(data=serieses,columns=pd.MultiIndex.from_tuples([(x, "Estimates") for x in serieses.keys()]))
All I get is a DataFrame with all NaNs.
我得到的只是一个包含所有 NaN 的 DataFrame。
For example, what I am looking for is roughly:
例如,我正在寻找的大致是:
l1               Estimates    
l2  one  two  one  two  one  two  one  two
r1   1    2    3    4    5    6    7    8
r2   1.1  2    3    4    5    6    71   8.2
where l1 and l2 are the labels for the MultiIndex
其中 l1 和 l2 是 MultiIndex 的标签
回答by Alex Rothberg
This appears to work:
这似乎有效:
import pandas as pd
data = {'a': [1,2,3,4], 'b': [10,20,30,40],'c': [100,200,300,400]}
df = pd.concat({"Estimates": pd.DataFrame(data)}, axis=1, names=["l1", "l2"])
l1  Estimates         
l2          a   b    c
0           1  10  100
1           2  20  200
2           3  30  300
3           4  40  400
回答by DimG
I know the question is really old but for pandasversion 0.19.1one can use direct dict-initialization:
我知道这个问题真的很老,但对于pandas版本0.19.1一可以使用直接 dict-initialization:
d = {('a','b'):[1,2,3,4], ('a','c'):[5,6,7,8]}
df = pd.DataFrame(d, index=['r1','r2','r3','r4'])
df.columns.names = ('l1','l2')
print df
l1  a   
l2  b  c
r1  1  5
r2  2  6
r3  3  7
r4  4  8
回答by Rutger Kassies
Im not sure but i think the use of a dict as input for your DF anda MulitIndex dont play well together. Using an array as input instead makes it work.
我不确定,但我认为使用 dict 作为 DF 的输入和MulitIndex 不能很好地配合使用。改为使用数组作为输入使其工作。
I often prefer dicts as input though, one way is to set the columns after creating the df:
我通常更喜欢 dicts 作为输入,一种方法是在创建 df 后设置列:
import pandas as pd
data = {'a': [1,2,3,4], 'b': [10,20,30,40],'c': [100,200,300,400]}
df = pd.DataFrame(np.array(data.values()).T, index=['r1','r2','r3','r4'])
tups = zip(*[['Estimates']*len(data),data.keys()])
df.columns = pd.MultiIndex.from_tuples(tups, names=['l1','l2'])
l1          Estimates         
l2          a   c    b
r1          1  10  100
r2          2  20  200
r3          3  30  300
r4          4  40  400
Or when using an array as input for the df:
或者当使用数组作为 df 的输入时:
data_arr = np.array([[1,2,3,4],[10,20,30,40],[100,200,300,400]])
tups = zip(*[['Estimates']*data_arr.shape[0],['a','b','c'])
df = pd.DataFrame(data_arr.T, index=['r1','r2','r3','r4'], columns=pd.MultiIndex.from_tuples(tups, names=['l1','l2']))
Which gives the same result.
这给出了相同的结果。

