pandas 合并多个具有非唯一索引的数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29656155/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Merging multiple dataframes with non unique indexes
提问by Bartek R.
Given two DFs with non unique indexes and multidimentional columns:
给定两个具有非唯一索引和多维列的 DF:
ars:
阿斯:
arsenal arsenal arsenal arsenal
NaN B3 SK BX BY
2015-04-15 NaN NaN NaN 26.0
2015-04-14 NaN NaN NaN NaN
2015-04-13 26.0 26.0 23.0 NaN
2015-04-13 22.0 21.0 19.0 NaN
che:
车:
chelsea chelsea chelsea chelsea
NaN B3 SK BX BY
2015-04-15 NaN NaN NaN 1.01
2015-04-14 1.02 NaN NaN NaN
2015-04-14 NaN 1.05 NaN NaN
here in csv format
这里是 csv 格式
,arsenal,arsenal,arsenal,arsenal
,B3,SK,BX,BY
2015-04-15,,,,26.0
2015-04-14,,,,
2015-04-13,26.0,26.0,23.0,
2015-04-13,22.0,21.0,19.0,
,chelsea,chelsea,chelsea,chelsea
,B3,SK,BX,BY
2015-04-15,,,,1.01
2015-04-14,1.02,,,
2015-04-14,,1.05,,
I would like to join/merge them, sort of an outer join so that rows are not dropped.
我想加入/合并它们,有点像外部连接,这样行就不会被删除。
I would like the output to be:
我希望输出是:
arsenal arsenal arsenal arsenal chelsea chelsea chelsea chelsea
NaN B3 SK BX BY B3 SK BX BY
2015-04-15 NaN NaN NaN 26.0 NaN NaN NaN 1.01
2015-04-14 NaN NaN NaN NaN 1.02 NaN NaN NaN
2015-04-14 NaN NaN NaN NaN NaN 1.05 NaN NaN
2015-04-13 26.0 26.0 23.0 NaN NaN NaN NaN NaN
2015-04-13 22.0 21.0 19.0 NaN NaN NaN NaN NaN
None of the pandas tools I know worked: merge, join, concat. merge's outer join gives a dot product which is not what I am looking for, while concatcan't handle non unique indexes.
我所知道的Pandas工具都没有工作:merge, join, concat。合并的外连接给出了一个点积,这不是我要找的,同时concat不能处理非唯一索引。
Do you have any ideas how this can be achieved?
你有什么想法可以实现吗?
Note: the lengths of dataframes won't be idential.
注意:数据帧的长度不会相同。
采纳答案by Bartek R.
I've managed to sort it out using pandas' concatmethod.
我已经设法使用Pandas的concat方法将其整理出来。
First, we need to add a Multiindex level so that it becomes unique:
首先,我们需要添加一个 Multiindex 级别,使其变得唯一:
ars = pd.read_csv("ars.csv", index_col=[0], header=[0,1])
che = pd.read_csv("che.csv", index_col=[0], header=[0,1])
ars.index.name = "date"
ars["num"] = range(0, len(ars.index))
ars = ars.set_index("num", append=True)
che.index.name = "date"
che["num"] = range(0, len(che.index))
che = che.set_index("num", append=True)
Now we can use concat:
现在我们可以使用concat:
df = pd.concat([ars, che], axis=1)
df = df.reset_index()
df = df.sort_index(by=["date", "num"], ascending=[False, True])
df = df.set_index(["date", "num"])
df.index = df.index.droplevel(1)
Output:
输出:
arsenal chelsea
B3 SK BX BY B3 SK BX BY
date
2015-04-15 NaN NaN NaN 26 NaN NaN NaN 1.01
2015-04-14 NaN NaN NaN NaN 1.02 NaN NaN NaN
2015-04-14 NaN NaN NaN NaN NaN 1.05 NaN NaN
2015-04-13 26 26 23 NaN NaN NaN NaN NaN
2015-04-13 22 21 19 NaN NaN NaN NaN NaN
回答by TheBlackCat
You want to use the on='outer'argument for join(test1.csvand test2.csvare the files you gave):
你想使用on='outer'的参数join(test1.csv和test2.csv是你给的文件):
df1 = pd.read_csv('test1.csv', index_col=0, header=[0,1])
df2 = pd.read_csv('test2.csv', index_col=0, header=[0,1])
df = df1.join(df2, how='outer')
This is the result I get:
这是我得到的结果:
arsenal chelsea
B3 SK BX BY B3 SK BX BY
2015-04-13 26 26 23 NaN NaN NaN NaN NaN
2015-04-14 NaN NaN NaN NaN 1.02 NaN NaN NaN
2015-04-14 NaN NaN NaN NaN NaN 1.05 NaN NaN
2015-04-15 NaN NaN NaN 26 NaN NaN NaN 1.01
回答by AdelNick
You need to use pandas.merge:
您需要使用pandas.merge:
pd.merge(ars, che, left_index = True, right_index = True, how = 'outer')
It can handle non-unique index and different size of the dataframes.
它可以处理非唯一索引和不同大小的数据帧。

