pandas 合并多个具有非唯一索引的数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29656155/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:12:32  来源:igfitidea点击:

Merging multiple dataframes with non unique indexes

pythonpython-3.xjoinpandasmerge

提问by Bartek R.

Given two DFs with non unique indexes and multidimentional columns:

给定两个具有非唯一索引和多维列的 DF:

ars:

阿斯:

           arsenal   arsenal   arsenal   arsenal
NaN             B3        SK        BX        BY
2015-04-15     NaN       NaN       NaN      26.0
2015-04-14     NaN       NaN       NaN       NaN
2015-04-13    26.0      26.0      23.0       NaN
2015-04-13    22.0      21.0      19.0       NaN

che:

车:

           chelsea   chelsea   chelsea   chelsea
NaN             B3        SK        BX        BY
2015-04-15     NaN       NaN       NaN      1.01
2015-04-14    1.02       NaN       NaN       NaN
2015-04-14     NaN      1.05       NaN       NaN

here in csv format

这里是 csv 格式

,arsenal,arsenal,arsenal,arsenal
,B3,SK,BX,BY
2015-04-15,,,,26.0
2015-04-14,,,,
2015-04-13,26.0,26.0,23.0,
2015-04-13,22.0,21.0,19.0,


,chelsea,chelsea,chelsea,chelsea
,B3,SK,BX,BY
2015-04-15,,,,1.01
2015-04-14,1.02,,,
2015-04-14,,1.05,,

I would like to join/merge them, sort of an outer join so that rows are not dropped.

我想加入/合并它们,有点像外部连接,这样行就不会被删除。

I would like the output to be:

我希望输出是:

            arsenal  arsenal   arsenal   arsenal chelsea   chelsea   chelsea   chelsea
NaN             B3        SK        BX        BY      B3        SK        BX        BY
2015-04-15     NaN       NaN       NaN      26.0     NaN       NaN       NaN      1.01
2015-04-14     NaN       NaN       NaN       NaN    1.02       NaN       NaN       NaN
2015-04-14     NaN       NaN       NaN       NaN     NaN      1.05       NaN       NaN
2015-04-13    26.0      26.0      23.0       NaN     NaN       NaN       NaN       NaN
2015-04-13    22.0      21.0      19.0       NaN     NaN       NaN       NaN       NaN

None of the pandas tools I know worked: merge, join, concat. merge's outer join gives a dot product which is not what I am looking for, while concatcan't handle non unique indexes.

我所知道的Pandas工具都没有工作:merge, join, concat。合并的外连接给出了一个点积,这不是我要找的,同时concat不能处理非唯一索引。

Do you have any ideas how this can be achieved?

你有什么想法可以实现吗?

Note: the lengths of dataframes won't be idential.

注意:数据帧的长度不会相同。

采纳答案by Bartek R.

I've managed to sort it out using pandas' concatmethod.

我已经设法使用Pandas的concat方法将其整理出来。

First, we need to add a Multiindex level so that it becomes unique:

首先,我们需要添加一个 Multiindex 级别,使其变得唯一:

ars = pd.read_csv("ars.csv", index_col=[0], header=[0,1])
che = pd.read_csv("che.csv", index_col=[0], header=[0,1])

ars.index.name = "date"
ars["num"] = range(0, len(ars.index))
ars = ars.set_index("num", append=True)

che.index.name = "date"
che["num"] = range(0, len(che.index))
che = che.set_index("num", append=True)

Now we can use concat:

现在我们可以使用concat

df = pd.concat([ars, che], axis=1)
df = df.reset_index()
df = df.sort_index(by=["date", "num"], ascending=[False, True])
df = df.set_index(["date", "num"])
df.index = df.index.droplevel(1)

Output:

输出:

                arsenal             chelsea                
                B3  SK  BX  BY      B3    SK  BX    BY
date                                                  
2015-04-15     NaN NaN NaN  26     NaN   NaN NaN  1.01
2015-04-14     NaN NaN NaN NaN    1.02   NaN NaN   NaN
2015-04-14     NaN NaN NaN NaN     NaN  1.05 NaN   NaN
2015-04-13      26  26  23 NaN     NaN   NaN NaN   NaN
2015-04-13      22  21  19 NaN     NaN   NaN NaN   NaN

回答by TheBlackCat

You want to use the on='outer'argument for join(test1.csvand test2.csvare the files you gave):

你想使用on='outer'的参数jointest1.csvtest2.csv是你给的文件):

df1 = pd.read_csv('test1.csv', index_col=0, header=[0,1])
df2 = pd.read_csv('test2.csv', index_col=0, header=[0,1])

df = df1.join(df2, how='outer')

This is the result I get:

这是我得到的结果:

           arsenal             chelsea  
                B3  SK  BX  BY      B3    SK  BX    BY
2015-04-13      26  26  23 NaN     NaN   NaN NaN   NaN
2015-04-14     NaN NaN NaN NaN    1.02   NaN NaN   NaN
2015-04-14     NaN NaN NaN NaN     NaN  1.05 NaN   NaN
2015-04-15     NaN NaN NaN  26     NaN   NaN NaN  1.01

回答by AdelNick

You need to use pandas.merge:

您需要使用pandas.merge

pd.merge(ars, che, left_index = True, right_index = True, how = 'outer')

It can handle non-unique index and different size of the dataframes.

它可以处理非唯一索引和不同大小的数据帧。