pandas 在不同的列名上合并两个不同的数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43735132/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:31:29  来源:igfitidea点击:

Merge two different dataframes on different column names

pythonpandasnumpymerge

提问by user1017373

I have two dataframes,

我有两个数据框,

df1 = pd.DataFrame({'A': ['A1', 'A1', 'A2', 'A3'],
                     'B': ['121', '345', '123', '146'],
                     'C': ['K0', 'K1', 'K0', 'K1']})

df2 = pd.DataFrame({'A': ['A1', 'A3'],
                      'BB': ['B0', 'B3'],
                      'CC': ['121', '345'],
                      'DD': ['D0', 'D1']})

Now I need to get the similiar rows from column A and B from df1 and column A and CC from df2. And so I tried possible merge options, such as:

现在我需要从 df1 的 A 列和 B 列以及 df2 的 A 列和 CC 中获取类似的行。所以我尝试了可能的合并选项,例如:

both_DFS=pd.merge(df1,df2, how='left',left_on=['A','B'],right_on=['A','CC'])

and this will not give me row information from df2 dataframe which is what I needed. Meaning, I have all column names from df2 but the rows are just empty or Nan.

这不会给我来自 df2 数据帧的行信息,而这正是我所需要的。意思是,我有来自 df2 的所有列名,但行只是空的或 Nan。

And then I tried:

然后我尝试:

Both_DFs=pd.merge(df1,df2, how='left',left_on=['A','B'],right_on=['A','CC'])[['A','B','CC']]

And this give me error as,

这给了我错误,因为

KeyError: "['B'] not in index"

I am aiming to have a merged Dataframe with all columns from both df1 and df2. Any suggestions would be great

我的目标是合并一个包含 df1 和 df2 中所有列的数据框。任何建议都会很棒

Desired output:

期望的输出:

 Both_DFs
    A   B   C   BB  CC  DD
0   A1  121 K0  B0  121 D0

So in my data frames (df1 and df2), only one row has exact match for both columns of interest. That is, Column A and B from df1 has only one row matching exactly to rows in columns A and CC in df2

因此,在我的数据框(df1 和 df2)中,只有一行与感兴趣的两列完全匹配。也就是说,df1 中的 A 列和 B 列只有一行与 df2 中 A 列和 CC 列中的行完全匹配

回答by zipa

Well, if you declare column Aas index, it works:

好吧,如果您将列声明A为索引,它会起作用:

Both_DFs = pd.merge(df1.set_index('A', drop=True),df2.set_index('A', drop=True), how='left',left_on=['B'],right_on=['CC'], left_index=True, right_index=True).dropna().reset_index()

This results in:

这导致:

    A    B   C  BB   CC  DD
0  A1  123  K0  B0  121  D0
1  A1  345  K1  B0  121  D0
2  A3  146  K1  B3  345  D1

EDIT

编辑

You just needed:

你只需要:

Both_DFs = pd.merge(df1,df2, how='left',left_on=['A','B'],right_on=['A','CC']).dropna()

Which gives:

这使:

    A    B   C  BB   CC  DD
0  A1  121  K0  B0  121  D0

回答by jezrael

You can also use joinwith default left join or merge, last if necessary remove rows with NaNs by dropna:

您还可以使用join默认的左连接或merge,如果需要,最后删除带有NaNs by 的行dropna

print (df1.join(df2.set_index('A'), on='A').dropna())
    A    B   C  BB   CC  DD
0  A1  123  K0  B0  121  D0
1  A1  345  K1  B0  121  D0
3  A3  146  K1  B3  345  D1


print (pd.merge(df1, df2, on='A', how='left').dropna())
    A    B   C  BB   CC  DD
0  A1  123  K0  B0  121  D0
1  A1  345  K1  B0  121  D0
3  A3  146  K1  B3  345  D1

EDIT:

编辑:

I think you need inner join(by default, so on='inner'can be omit):

我认为你需要inner join(默认情况下,所以on='inner'可以省略):

Both_DFs = pd.merge(df1,df2, left_on=['A','B'],right_on=['A','CC'])
print (Both_DFs)
    A    B   C  BB   CC  DD
0  A1  121  K0  B0  121  D0

回答by Jérémy Caré

I don't know if your example show exactly your problem but,

我不知道您的示例是否准确显示了您的问题,但是,

If we try to merge with MultiIndex, we need to have the 2 index matching.

如果我们尝试与 MultiIndex 合并,我们需要有 2 个索引匹配。

df1['A'] == df2['A'] && df1['B'] == df2['CC']

df1['A'] == df2['A'] && df1['B'] == df2['CC']

Here we haven't any row that match the 2 index.

这里我们没有任何与 2 索引匹配的行。

If we merge just by df1['A'], we got something like this : Both_DFs=pd.merge(df1, df2, how='left', left_on=['A'], right_on=['A'])

如果我们只通过 df1['A'] 合并,我们会得到这样的结果: Both_DFs=pd.merge(df1, df2, how='left', left_on=['A'], right_on=['A'])

    A    B   C   BB   CC   DD
0  A1  123  K0   B0  121   D0
1  A1  345  K1   B0  121   D0
2  A2  121  K0  NaN  NaN  NaN
3  A3  146  K1   B3  345   D1

If you wan't remove line row that not in df2 try to change 'how' method to inner.

如果您不想删除不在 df2 中的行,请尝试将 'how' 方法更改为内部。

Both_DFs=pd.merge(df1, df2, how='left', left_on=['A'], right_on=['A'])
   A    B   C   BB   CC   DD
0  A1  123  K0   B0  121   D0
1  A1  345  K1   B0  121   D0
2  A3  146  K1   B3  345   D1

Did this approach of what you're looking for ?

这种方法是否符合您的要求?