pandas 两个或多个数据框的全外连接

Question

提问by ps0604

Given the following three Pandas data frames, I need to merge them similar to an SQL full outer join. Note that the key is multi-index type_Nand id_Nwith N= 1,2,3:

给定以下三个 Pandas 数据框，我需要将它们合并，类似于 SQL 全外连接。请注意，该键是多索引的，type_N并且id_N具有N= 1,2,3：

import pandas as pd

raw_data = {
        'type_1': [0, 1, 1,1],
        'id_1': ['3', '4', '5','5'],
        'name_1': ['Alex', 'Amy', 'Allen', 'Jane']}
df_a = pd.DataFrame(raw_data, columns = ['type_1', 'id_1', 'name_1' ])

raw_datab = {
        'type_2': [1, 1, 1, 0],
        'id_2': ['4', '5', '5', '7'],
        'name_2': ['Bill', 'Brian', 'Joe', 'Bryce']}
df_b = pd.DataFrame(raw_datab, columns = ['type_2', 'id_2', 'name_2'])

raw_datac = {
        'type_3': [1, 0],
        'id_3': ['4', '7'],
        'name_3': ['School', 'White']}
df_c = pd.DataFrame(raw_datac, columns = ['type_3', 'id_3', 'name_3'])

The expected result should be:

预期的结果应该是：

type_1   id_1   name_1   type_2   id_2   name_2   type_3   id_3   name_3
0        3      Alex     NaN      NaN    NaN      NaN      NaN    NaN
1        4      Amy      1        4      Bill     1        4      School
1        5      Allen    1        5      Brian    NaN      NaN    NaN
1        5      Allen    1        5      Joe      NaN      NaN    NaN
1        5      Jane     1        5      Brian    NaN      NaN    NaN
1        5      Jane     1        5      Joe      NaN      NaN    NaN
NaN      NaN    NaN      0        7      Bryce    0        7      White

How can this be achieved in Pandas?

如何在 Pandas 中实现这一点？

Answer 1

回答by piRSquared

I'll propose that you make life less complicated and not have different names for the things you want to merge on.

我会建议你让生活不那么复杂，不要为你想要合并的东西起不同的名字。

da = df_a.set_index(['type_1', 'id_1']).rename_axis(['type', 'id'])
db = df_b.set_index(['type_2', 'id_2']).rename_axis(['type', 'id'])
dc = df_c.set_index(['type_3', 'id_3']).rename_axis(['type', 'id'])

da.join(db, how='outer').join(dc, how='outer')

        name_1 name_2  name_3
type id                      
0    3    Alex    NaN     NaN
     7     NaN  Bryce   White
1    4     Amy   Bill  School
     5   Allen  Brian     NaN
     5   Allen    Joe     NaN
     5    Jane  Brian     NaN
     5    Jane    Joe     NaN

Here's an obnoxious way to get those other columns

这是获取其他列的令人讨厌的方法

from cytoolz.dicttoolz import merge

i = pd.DataFrame(d.index.values.tolist(), d.index, d.index.names)
d = d.assign(**merge(
    i.mask(d[f'name_{j}'].isna()).add_suffix(f'_{j}').to_dict('l')
    for j in [1, 2, 3]
))

d[sorted(d.columns, key=lambda x: x.split('_')[::-1])]

        id_1 name_1  type_1 id_2 name_2  type_2 id_3  name_3  type_3
type id                                                             
0    3     3   Alex     0.0  NaN    NaN     NaN  NaN     NaN     NaN
     7   NaN    NaN     NaN    7  Bryce     0.0    7   White     0.0
1    4     4    Amy     1.0    4   Bill     1.0    4  School     1.0
     5     5  Allen     1.0    5  Brian     1.0  NaN     NaN     NaN
     5     5  Allen     1.0    5    Joe     1.0  NaN     NaN     NaN
     5     5   Jane     1.0    5  Brian     1.0  NaN     NaN     NaN
     5     5   Jane     1.0    5    Joe     1.0  NaN     NaN     NaN

Answer 2

回答by user3483203

You could use 2 consecutive merges, first on df_aand df_b, and then on df_c:

您可以使用 2 次连续合并，首先在df_a和df_b，然后在df_c：

In [49]: df_temp = df_a.merge(df_b, how='outer', left_on=['type_1', 'id_1'], right_on=['type_2', 'id_2'])

In [50]: df_temp.merge(df_c, how='outer', left_on=['type_2', 'id_2'], right_on=['type_3', 'id_3'])
Out[50]:
   type_1 id_1 name_1 type_2 id_2 name_2  type_3 id_3  name_3
0     0.0    3   Alex    NaN  NaN    NaN     NaN  NaN     NaN
1     1.0    4    Amy      1    4   Bill     1.0    4  School
2     1.0    5  Allen      1    5  Brian     NaN  NaN     NaN
3     1.0    5  Allen      1    5    Joe     NaN  NaN     NaN
4     1.0    5   Jane      1    5  Brian     NaN  NaN     NaN
5     1.0    5   Jane      1    5    Joe     NaN  NaN     NaN
6     NaN  NaN    NaN      0    7  Bryce     0.0    7   White

Answer 3

回答by YOBEN_S

Let us try create a new key for this , I am using reducehere

让我们尝试为此创建一个新密钥，我在reduce这里使用

import functools
dfs=[df_a,df_b,df_c]
dfs=[x.assign(key=list(zip(x.iloc[:,0],x.iloc[:,1]))) for x in dfs]
merged_df = functools.reduce(lambda left,right: pd.merge(left,right,on='key',how='outer'), dfs)
merged_df.drop('key',1) 
Out[110]: 
   type_1 id_1 name_1  type_2 id_2 name_2  type_3 id_3  name_3
0     0.0    3   Alex     NaN  NaN    NaN     NaN  NaN     NaN
1     1.0    4    Amy     1.0    4   Bill     1.0    4  School
2     1.0    5  Allen     1.0    5  Brian     NaN  NaN     NaN
3     1.0    5  Allen     1.0    5    Joe     NaN  NaN     NaN
4     1.0    5   Jane     1.0    5  Brian     NaN  NaN     NaN
5     1.0    5   Jane     1.0    5    Joe     NaN  NaN     NaN
6     NaN  NaN    NaN     0.0    7  Bryce     0.0    7   White

pandas 两个或多个数据框的全外连接

提问by ps0604

回答by piRSquared

回答by user3483203

回答by YOBEN_S

相关推荐

最近更新

标签

pandas 两个或多个数据框的全外连接

提问by ps0604

回答by piRSquared

回答by user3483203

回答by YOBEN_S

相关推荐

pandas Python（NLTK）-提取名词短语的更有效方法？

Pandas 在 csv 读取后删除第一列

pandas DtypeWarning：列 (15,16,18,24) 具有混合类型。如果列具有混合类型，则会被删除

如何在 Sublime Text 3 上安装 Pandas

相关推荐

最近更新

标签