pandas 两个或多个数据框的全外连接
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49787325/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Full outer join of two or more data frames
提问by ps0604
Given the following three Pandas data frames, I need to merge them similar to an SQL full outer join. Note that the key is multi-index type_N
and id_N
with N
= 1,2,3:
给定以下三个 Pandas 数据框,我需要将它们合并,类似于 SQL 全外连接。请注意,该键是多索引的,type_N
并且id_N
具有N
= 1,2,3:
import pandas as pd
raw_data = {
'type_1': [0, 1, 1,1],
'id_1': ['3', '4', '5','5'],
'name_1': ['Alex', 'Amy', 'Allen', 'Jane']}
df_a = pd.DataFrame(raw_data, columns = ['type_1', 'id_1', 'name_1' ])
raw_datab = {
'type_2': [1, 1, 1, 0],
'id_2': ['4', '5', '5', '7'],
'name_2': ['Bill', 'Brian', 'Joe', 'Bryce']}
df_b = pd.DataFrame(raw_datab, columns = ['type_2', 'id_2', 'name_2'])
raw_datac = {
'type_3': [1, 0],
'id_3': ['4', '7'],
'name_3': ['School', 'White']}
df_c = pd.DataFrame(raw_datac, columns = ['type_3', 'id_3', 'name_3'])
The expected result should be:
预期的结果应该是:
type_1 id_1 name_1 type_2 id_2 name_2 type_3 id_3 name_3
0 3 Alex NaN NaN NaN NaN NaN NaN
1 4 Amy 1 4 Bill 1 4 School
1 5 Allen 1 5 Brian NaN NaN NaN
1 5 Allen 1 5 Joe NaN NaN NaN
1 5 Jane 1 5 Brian NaN NaN NaN
1 5 Jane 1 5 Joe NaN NaN NaN
NaN NaN NaN 0 7 Bryce 0 7 White
How can this be achieved in Pandas?
如何在 Pandas 中实现这一点?
回答by piRSquared
I'll propose that you make life less complicated and not have different names for the things you want to merge on.
我会建议你让生活不那么复杂,不要为你想要合并的东西起不同的名字。
da = df_a.set_index(['type_1', 'id_1']).rename_axis(['type', 'id'])
db = df_b.set_index(['type_2', 'id_2']).rename_axis(['type', 'id'])
dc = df_c.set_index(['type_3', 'id_3']).rename_axis(['type', 'id'])
da.join(db, how='outer').join(dc, how='outer')
name_1 name_2 name_3
type id
0 3 Alex NaN NaN
7 NaN Bryce White
1 4 Amy Bill School
5 Allen Brian NaN
5 Allen Joe NaN
5 Jane Brian NaN
5 Jane Joe NaN
Here's an obnoxious way to get those other columns
这是获取其他列的令人讨厌的方法
from cytoolz.dicttoolz import merge
i = pd.DataFrame(d.index.values.tolist(), d.index, d.index.names)
d = d.assign(**merge(
i.mask(d[f'name_{j}'].isna()).add_suffix(f'_{j}').to_dict('l')
for j in [1, 2, 3]
))
d[sorted(d.columns, key=lambda x: x.split('_')[::-1])]
id_1 name_1 type_1 id_2 name_2 type_2 id_3 name_3 type_3
type id
0 3 3 Alex 0.0 NaN NaN NaN NaN NaN NaN
7 NaN NaN NaN 7 Bryce 0.0 7 White 0.0
1 4 4 Amy 1.0 4 Bill 1.0 4 School 1.0
5 5 Allen 1.0 5 Brian 1.0 NaN NaN NaN
5 5 Allen 1.0 5 Joe 1.0 NaN NaN NaN
5 5 Jane 1.0 5 Brian 1.0 NaN NaN NaN
5 5 Jane 1.0 5 Joe 1.0 NaN NaN NaN
回答by user3483203
You could use 2 consecutive merges, first on df_a
and df_b
, and then on df_c
:
您可以使用 2 次连续合并,首先在df_a
和df_b
,然后在df_c
:
In [49]: df_temp = df_a.merge(df_b, how='outer', left_on=['type_1', 'id_1'], right_on=['type_2', 'id_2'])
In [50]: df_temp.merge(df_c, how='outer', left_on=['type_2', 'id_2'], right_on=['type_3', 'id_3'])
Out[50]:
type_1 id_1 name_1 type_2 id_2 name_2 type_3 id_3 name_3
0 0.0 3 Alex NaN NaN NaN NaN NaN NaN
1 1.0 4 Amy 1 4 Bill 1.0 4 School
2 1.0 5 Allen 1 5 Brian NaN NaN NaN
3 1.0 5 Allen 1 5 Joe NaN NaN NaN
4 1.0 5 Jane 1 5 Brian NaN NaN NaN
5 1.0 5 Jane 1 5 Joe NaN NaN NaN
6 NaN NaN NaN 0 7 Bryce 0.0 7 White
回答by YOBEN_S
Let us try create a new key for this , I am using reduce
here
让我们尝试为此创建一个新密钥,我在reduce
这里使用
import functools
dfs=[df_a,df_b,df_c]
dfs=[x.assign(key=list(zip(x.iloc[:,0],x.iloc[:,1]))) for x in dfs]
merged_df = functools.reduce(lambda left,right: pd.merge(left,right,on='key',how='outer'), dfs)
merged_df.drop('key',1)
Out[110]:
type_1 id_1 name_1 type_2 id_2 name_2 type_3 id_3 name_3
0 0.0 3 Alex NaN NaN NaN NaN NaN NaN
1 1.0 4 Amy 1.0 4 Bill 1.0 4 School
2 1.0 5 Allen 1.0 5 Brian NaN NaN NaN
3 1.0 5 Allen 1.0 5 Joe NaN NaN NaN
4 1.0 5 Jane 1.0 5 Brian NaN NaN NaN
5 1.0 5 Jane 1.0 5 Joe NaN NaN NaN
6 NaN NaN NaN 0.0 7 Bryce 0.0 7 White