pandas 合并多个具有非唯一索引的数据帧

Question

提问by Bartek R.

Given two DFs with non unique indexes and multidimentional columns:

给定两个具有非唯一索引和多维列的 DF：

ars:

阿斯：

           arsenal   arsenal   arsenal   arsenal
NaN             B3        SK        BX        BY
2015-04-15     NaN       NaN       NaN      26.0
2015-04-14     NaN       NaN       NaN       NaN
2015-04-13    26.0      26.0      23.0       NaN
2015-04-13    22.0      21.0      19.0       NaN

che:

车：

           chelsea   chelsea   chelsea   chelsea
NaN             B3        SK        BX        BY
2015-04-15     NaN       NaN       NaN      1.01
2015-04-14    1.02       NaN       NaN       NaN
2015-04-14     NaN      1.05       NaN       NaN

here in csv format

这里是 csv 格式

,arsenal,arsenal,arsenal,arsenal
,B3,SK,BX,BY
2015-04-15,,,,26.0
2015-04-14,,,,
2015-04-13,26.0,26.0,23.0,
2015-04-13,22.0,21.0,19.0,

,chelsea,chelsea,chelsea,chelsea
,B3,SK,BX,BY
2015-04-15,,,,1.01
2015-04-14,1.02,,,
2015-04-14,,1.05,,

I would like to join/merge them, sort of an outer join so that rows are not dropped.

我想加入/合并它们，有点像外部连接，这样行就不会被删除。

I would like the output to be:

我希望输出是：

            arsenal  arsenal   arsenal   arsenal chelsea   chelsea   chelsea   chelsea
NaN             B3        SK        BX        BY      B3        SK        BX        BY
2015-04-15     NaN       NaN       NaN      26.0     NaN       NaN       NaN      1.01
2015-04-14     NaN       NaN       NaN       NaN    1.02       NaN       NaN       NaN
2015-04-14     NaN       NaN       NaN       NaN     NaN      1.05       NaN       NaN
2015-04-13    26.0      26.0      23.0       NaN     NaN       NaN       NaN       NaN
2015-04-13    22.0      21.0      19.0       NaN     NaN       NaN       NaN       NaN

None of the pandas tools I know worked: merge, join, concat. merge's outer join gives a dot product which is not what I am looking for, while concatcan't handle non unique indexes.

我所知道的Pandas工具都没有工作：merge, join, concat。合并的外连接给出了一个点积，这不是我要找的，同时concat不能处理非唯一索引。

Do you have any ideas how this can be achieved?

你有什么想法可以实现吗？

Note: the lengths of dataframes won't be idential.

注意：数据帧的长度不会相同。

Answer 1

采纳答案by Bartek R.

I've managed to sort it out using pandas' concatmethod.

我已经设法使用Pandas的concat方法将其整理出来。

First, we need to add a Multiindex level so that it becomes unique:

首先，我们需要添加一个 Multiindex 级别，使其变得唯一：

ars = pd.read_csv("ars.csv", index_col=[0], header=[0,1])
che = pd.read_csv("che.csv", index_col=[0], header=[0,1])

ars.index.name = "date"
ars["num"] = range(0, len(ars.index))
ars = ars.set_index("num", append=True)

che.index.name = "date"
che["num"] = range(0, len(che.index))
che = che.set_index("num", append=True)

Now we can use concat:

现在我们可以使用concat：

df = pd.concat([ars, che], axis=1)
df = df.reset_index()
df = df.sort_index(by=["date", "num"], ascending=[False, True])
df = df.set_index(["date", "num"])
df.index = df.index.droplevel(1)

Output:

输出：

                arsenal             chelsea                
                B3  SK  BX  BY      B3    SK  BX    BY
date                                                  
2015-04-15     NaN NaN NaN  26     NaN   NaN NaN  1.01
2015-04-14     NaN NaN NaN NaN    1.02   NaN NaN   NaN
2015-04-14     NaN NaN NaN NaN     NaN  1.05 NaN   NaN
2015-04-13      26  26  23 NaN     NaN   NaN NaN   NaN
2015-04-13      22  21  19 NaN     NaN   NaN NaN   NaN

Answer 2

回答by TheBlackCat

You want to use the on='outer'argument for join(test1.csvand test2.csvare the files you gave):

你想使用on='outer'的参数join（test1.csv和test2.csv是你给的文件）：

df1 = pd.read_csv('test1.csv', index_col=0, header=[0,1])
df2 = pd.read_csv('test2.csv', index_col=0, header=[0,1])

df = df1.join(df2, how='outer')

This is the result I get:

这是我得到的结果：

           arsenal             chelsea  
                B3  SK  BX  BY      B3    SK  BX    BY
2015-04-13      26  26  23 NaN     NaN   NaN NaN   NaN
2015-04-14     NaN NaN NaN NaN    1.02   NaN NaN   NaN
2015-04-14     NaN NaN NaN NaN     NaN  1.05 NaN   NaN
2015-04-15     NaN NaN NaN  26     NaN   NaN NaN  1.01

Answer 3

回答by AdelNick

You need to use pandas.merge:

您需要使用pandas.merge：

pd.merge(ars, che, left_index = True, right_index = True, how = 'outer')

It can handle non-unique index and different size of the dataframes.

它可以处理非唯一索引和不同大小的数据帧。

pandas 合并多个具有非唯一索引的数据帧

提问by Bartek R.

采纳答案by Bartek R.

回答by TheBlackCat

回答by AdelNick

相关推荐

最近更新

标签

pandas 合并多个具有非唯一索引的数据帧

提问by Bartek R.

采纳答案by Bartek R.

回答by TheBlackCat

回答by AdelNick

相关推荐

pandas 使用 Python 生成报告：PDF 或 HTML 到 PDF

pandas 带有条件的列上的熊猫累积总和

如何在 Pandas 中的超大数据帧上创建数据透视表

如何将 Pandas 数据框转换为 numpy 数据框

相关推荐

最近更新

标签