pandas 使用分层列创建 DataFrame

Question

提问by Alex Rothberg

What is the easiest way to create a DataFramewith hierarchical columns?

创建DataFrame具有分层列的最简单方法是什么？

I am currently creating a DataFrame from a dict of names -> Seriesusing:

我目前正在从名称字典创建一个 DataFrame ->Series使用：

df = pd.DataFrame(data=serieses)

I would like to use the same columns names but add an additional level of hierarchy on the columns. For the time being I want the additional level to have the same value for columns, let's say "Estimates".

我想使用相同的列名称，但在列上添加额外的层次结构。目前，我希望附加级别对列具有相同的值，让我们说“估计”。

I am trying the following but that does not seem to work:

我正在尝试以下操作，但这似乎不起作用：

pd.DataFrame(data=serieses,columns=pd.MultiIndex.from_tuples([(x, "Estimates") for x in serieses.keys()]))

All I get is a DataFrame with all NaNs.

我得到的只是一个包含所有 NaN 的 DataFrame。

For example, what I am looking for is roughly:

例如，我正在寻找的大致是：

l1               Estimates    
l2  one  two  one  two  one  two  one  two
r1   1    2    3    4    5    6    7    8
r2   1.1  2    3    4    5    6    71   8.2

where l1 and l2 are the labels for the MultiIndex

其中 l1 和 l2 是 MultiIndex 的标签

Answer 1

回答by Alex Rothberg

This appears to work:

这似乎有效：

import pandas as pd

data = {'a': [1,2,3,4], 'b': [10,20,30,40],'c': [100,200,300,400]}

df = pd.concat({"Estimates": pd.DataFrame(data)}, axis=1, names=["l1", "l2"])

l1  Estimates         
l2          a   b    c
0           1  10  100
1           2  20  200
2           3  30  300
3           4  40  400

Answer 2

回答by DimG

I know the question is really old but for pandasversion 0.19.1one can use direct dict-initialization:

我知道这个问题真的很老，但对于pandas版本0.19.1一可以使用直接 dict-initialization：

d = {('a','b'):[1,2,3,4], ('a','c'):[5,6,7,8]}
df = pd.DataFrame(d, index=['r1','r2','r3','r4'])
df.columns.names = ('l1','l2')
print df

l1  a   
l2  b  c
r1  1  5
r2  2  6
r3  3  7
r4  4  8

Answer 3

回答by Rutger Kassies

Im not sure but i think the use of a dict as input for your DF anda MulitIndex dont play well together. Using an array as input instead makes it work.

我不确定，但我认为使用 dict 作为 DF 的输入和MulitIndex 不能很好地配合使用。改为使用数组作为输入使其工作。

I often prefer dicts as input though, one way is to set the columns after creating the df:

我通常更喜欢 dicts 作为输入，一种方法是在创建 df 后设置列：

import pandas as pd

data = {'a': [1,2,3,4], 'b': [10,20,30,40],'c': [100,200,300,400]}
df = pd.DataFrame(np.array(data.values()).T, index=['r1','r2','r3','r4'])

tups = zip(*[['Estimates']*len(data),data.keys()])

df.columns = pd.MultiIndex.from_tuples(tups, names=['l1','l2'])

l1          Estimates         
l2          a   c    b
r1          1  10  100
r2          2  20  200
r3          3  30  300
r4          4  40  400

Or when using an array as input for the df:

或者当使用数组作为 df 的输入时：

data_arr = np.array([[1,2,3,4],[10,20,30,40],[100,200,300,400]])

tups = zip(*[['Estimates']*data_arr.shape[0],['a','b','c'])
df = pd.DataFrame(data_arr.T, index=['r1','r2','r3','r4'], columns=pd.MultiIndex.from_tuples(tups, names=['l1','l2']))

Which gives the same result.

这给出了相同的结果。

pandas 使用分层列创建 DataFrame

提问by Alex Rothberg

回答by Alex Rothberg

回答by DimG

回答by Rutger Kassies

相关推荐

最近更新

标签

pandas 使用分层列创建 DataFrame

提问by Alex Rothberg

回答by Alex Rothberg

回答by DimG

回答by Rutger Kassies

相关推荐

pandas 计算两个系列之间的工作日

Pandas：按平均值对列进行排序

pandas PyQt - QTableView 中的复选框列

从多个字典创建一个 Pandas DataFrame

相关推荐

最近更新

标签