Python 在新的多索引级别下连接 Pandas 列

Question

提问by Zero

Given a dictionary of data frames like:

给定一个数据框字典，例如：

dict = {'ABC': df1, 'XYZ' : df2}   # of any length...

where each data frame has the same columns and similar index, for example:

其中每个数据框具有相同的列和相似的索引，例如：

data           Open     High      Low    Close   Volume
Date                                                   
2002-01-17  0.18077  0.18800  0.16993  0.18439  1720833
2002-01-18  0.18439  0.21331  0.18077  0.19523  2027866
2002-01-21  0.19523  0.20970  0.19162  0.20608   771149

What is the simplest way to combine all the data frames into one, with a multi-index like:

使用多索引将所有数据帧合并为一个的最简单方法是什么：

symbol         ABC                                       XYZ
data           Open     High      Low    Close   Volume  Open ...
Date                                                   
2002-01-17  0.18077  0.18800  0.16993  0.18439  1720833  ...
2002-01-18  0.18439  0.21331  0.18077  0.19523  2027866  ...
2002-01-21  0.19523  0.20970  0.19162  0.20608   771149  ...

I've tried a few methods - eg for each data frame replace the columns with a multi-index like .from_product(['ABC', columns])and then concatenate along axis=1, without success.

我尝试了几种方法 - 例如，对于每个数据框，用多索引替换列，.from_product(['ABC', columns])然后将列连接起来axis=1，但没有成功。

Answer 1

采纳答案by Karl D.

You can do it with concat(the keysargument will create the hierarchical columns index):

您可以使用concat（该keys参数将创建分层列索引）：

d = {'ABC' : df1, 'XYZ' : df2}
print pd.concat(d.values(), axis=1, keys=d.keys())


                XYZ                                          ABC           \
               Open     High      Low    Close   Volume     Open     High   
Date                                                                        
2002-01-17  0.18077  0.18800  0.16993  0.18439  1720833  0.18077  0.18800   
2002-01-18  0.18439  0.21331  0.18077  0.19523  2027866  0.18439  0.21331   
2002-01-21  0.19523  0.20970  0.19162  0.20608   771149  0.19523  0.20970   


                Low    Close   Volume  
Date                                   
2002-01-17  0.16993  0.18439  1720833  
2002-01-18  0.18077  0.19523  2027866  
2002-01-21  0.19162  0.20608   771149

Really concatwants lists so the following is equivalent:

真的concat想要列表所以下面是等效的：

print(pd.concat([df1, df2], axis=1, keys=['ABC', 'XYZ']))

Answer 2

回答by Woody Pride

Add a symbol column to your dataframes and set the index to include the symbol column, concat and then unstack that level:

将符号列添加到您的数据帧并设置索引以包含符号列，连接然后取消堆叠该级别：

The following assumes that there are as many symbols as DataFrames in your dict, and also that you check that the order of symbols is as you want it based on the order of the dict keys:

以下假设您的 dict 中有与 DataFrame 一样多的符号，并且您还根据 dict 键的顺序检查符号的顺序是否符合您的要求：

DF_dict = {'ABC': df1, 'XYZ' : df2} 
dict_keys = DF_dict.keys()
symbols = ['ABC', 'ZXY']

for x in xrange(len(symbols)):
    DF_dict[dict_keys[x]]['symbol'] = symbols[x]
    DF_dict[dict_keys[x]].reset_index(inplace = True)
    DF_dict[dict_keys[x]].set_index(['symbol', 'Date'], inplace = True)

DF = pd.concat(DF_dict[df] for df in dict_keys)
DF = DF.unstack('symbol')

I think that would be the approach I would take. Some people are against the inplacesyntax. I use it here only as convenience.

我想这就是我会采取的方法。有些人反对inplace语法。我在这里使用它只是为了方便。

Python 在新的多索引级别下连接 Pandas 列

提问by Zero

采纳答案by Karl D.

回答by Woody Pride

相关推荐

最近更新

标签

Python 在新的多索引级别下连接 Pandas 列

提问by Zero

采纳答案by Karl D.

回答by Woody Pride

相关推荐

Python 处理 URL 的用户名和密码

Python urllib HTTPS 请求：<urlopen 错误未知 url 类型：https>

Python 如何避免在 matplotlib 饼图中标签和 autopct 重叠？

Python 从列表项中删除引号

相关推荐

最近更新

标签