Python 在新的多索引级别下连接 Pandas 列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23600582/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:13:56  来源:igfitidea点击:

Concatenate Pandas columns under new multi-index level

pythonpandasmulti-index

提问by Zero

Given a dictionary of data frames like:

给定一个数据框字典,例如:

dict = {'ABC': df1, 'XYZ' : df2}   # of any length...

where each data frame has the same columns and similar index, for example:

其中每个数据框具有相同的列和相似的索引,例如:

data           Open     High      Low    Close   Volume
Date                                                   
2002-01-17  0.18077  0.18800  0.16993  0.18439  1720833
2002-01-18  0.18439  0.21331  0.18077  0.19523  2027866
2002-01-21  0.19523  0.20970  0.19162  0.20608   771149

What is the simplest way to combine all the data frames into one, with a multi-index like:

使用多索引将所有数据帧合并为一个的最简单方法是什么:

symbol         ABC                                       XYZ
data           Open     High      Low    Close   Volume  Open ...
Date                                                   
2002-01-17  0.18077  0.18800  0.16993  0.18439  1720833  ...
2002-01-18  0.18439  0.21331  0.18077  0.19523  2027866  ...
2002-01-21  0.19523  0.20970  0.19162  0.20608   771149  ...

I've tried a few methods - eg for each data frame replace the columns with a multi-index like .from_product(['ABC', columns])and then concatenate along axis=1, without success.

我尝试了几种方法 - 例如,对于每个数据框,用多索引替换列,.from_product(['ABC', columns])然后将列连接起来axis=1,但没有成功。

采纳答案by Karl D.

You can do it with concat(the keysargument will create the hierarchical columns index):

您可以使用concat(该keys参数将创建分层列索引):

d = {'ABC' : df1, 'XYZ' : df2}
print pd.concat(d.values(), axis=1, keys=d.keys())


                XYZ                                          ABC           \
               Open     High      Low    Close   Volume     Open     High   
Date                                                                        
2002-01-17  0.18077  0.18800  0.16993  0.18439  1720833  0.18077  0.18800   
2002-01-18  0.18439  0.21331  0.18077  0.19523  2027866  0.18439  0.21331   
2002-01-21  0.19523  0.20970  0.19162  0.20608   771149  0.19523  0.20970   


                Low    Close   Volume  
Date                                   
2002-01-17  0.16993  0.18439  1720833  
2002-01-18  0.18077  0.19523  2027866  
2002-01-21  0.19162  0.20608   771149

Really concatwants lists so the following is equivalent:

真的concat想要列表所以下面是等效的:

print(pd.concat([df1, df2], axis=1, keys=['ABC', 'XYZ']))

回答by Woody Pride

Add a symbol column to your dataframes and set the index to include the symbol column, concat and then unstack that level:

将符号列添加到您的数据帧并设置索引以包含符号列,连接然后取消堆叠该级别:

The following assumes that there are as many symbols as DataFrames in your dict, and also that you check that the order of symbols is as you want it based on the order of the dict keys:

以下假设您的 dict 中有与 DataFrame 一样多的符号,并且您还根据 dict 键的顺序检查符号的顺序是否符合您的要求:

DF_dict = {'ABC': df1, 'XYZ' : df2} 
dict_keys = DF_dict.keys()
symbols = ['ABC', 'ZXY']

for x in xrange(len(symbols)):
    DF_dict[dict_keys[x]]['symbol'] = symbols[x]
    DF_dict[dict_keys[x]].reset_index(inplace = True)
    DF_dict[dict_keys[x]].set_index(['symbol', 'Date'], inplace = True)

DF = pd.concat(DF_dict[df] for df in dict_keys)
DF = DF.unstack('symbol')

I think that would be the approach I would take. Some people are against the inplacesyntax. I use it here only as convenience.

我想这就是我会采取的方法。有些人反对inplace语法。我在这里使用它只是为了方便。