pandas 基于多个条件加入两个熊猫数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/53549492/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:11:06  来源:igfitidea点击:

Joining two pandas dataframes based on multiple conditions

pythonpandasdataframemerge

提问by iprof0214

df_aand df_bare two dataframes that looks like following

df_a并且df_b是两个数据框,如下所示

df_a
A   B       C      D     E
x1  Apple   0.3   0.9    0.6
x1  Orange  0.1   0.5    0.2
x2  Apple   0.2   0.2    0.1
x2  Orange  0.3   0.4    0.9
x2  Mango   0.1   0.2    0.3
x3  Orange  0.3   0.1    0.2


df_b
A   B_new   F    
x1  Apple   0.3  
x1  Mango   0.2  
x1  Orange  0.1   
x2  Apple   0.2   
x2  Orange  0.3     
x2  Mango   0.1  
x3  Orange  0.3  
x3  Mango   0.2  
x3  Apple   0.1  

I want my final_dfto contain all the rows contained in df_asuch that it contemplates the unique combination of df_a['A'] == df_b['A']and df_a['B'] == df_b['B_new'].

我希望我的final_df包含包含在所有的行df_a,使得它设想的独特组合df_a['A'] == df_b['A']df_a['B'] == df_b['B_new']

I've tried doing outer join and then drop duplicates w.r.t columns A and B in final_dfbut the value of B_new is not retained.

我试过做外连接,然后删除重复的列 A 和 B,final_df但不保留 B_new 的值。

Following is how I want my result_dfto look like:

以下是我希望我result_df的样子:

result_df

结果_df

 A   B       C      D     E   B_new  F
x1  Apple   0.3   0.9    0.6  Apple  0.3
x1  Orange  0.1   0.5    0.2  Orange 0.1
x2  Apple   0.2   0.2    0.1  Apple   0.2 
x2  Orange  0.3   0.4    0.9  Orange  0.3
x2  Mango   0.1   0.2    0.3  Mango   0.1
x3  Orange  0.3   0.1    0.2  Orange  0.3

I also tried left outer join:

我也试过左外连接:

final_df = pd.merge(df_a, df_b, how="left", on=['A'])

The size of this dataframe is a union of df_aand df_bwhich is not what I want.

此数据框的大小是df_a和 的并集,df_b这不是我想要的。

Appreciate any suggestions.

感谢任何建议。

采纳答案by jpp

You need an inner merge, specifying bothmerge columns in each case:

您需要内部合并,在每种情况下指定两个合并列:

res = df_a.merge(df_b, how='inner', left_on=['A', 'B'], right_on=['A', 'B_new'])

print(res)

    A       B    C    D    E   B_new    F
0  x1   Apple  0.3  0.9  0.6   Apple  0.3
1  x1  Orange  0.1  0.5  0.2  Orange  0.1
2  x2   Apple  0.2  0.2  0.1   Apple  0.2
3  x2  Orange  0.3  0.4  0.9  Orange  0.3
4  x2   Mango  0.1  0.2  0.3   Mango  0.1
5  x3  Orange  0.3  0.1  0.2  Orange  0.3

回答by Daniel

You can still achieve this with a left join which is very ideal.
See below:

您仍然可以使用非常理想的左连接来实现这一点。
见下文:

final_df = pd.merge(df_a, df_b[['A', 'B_new','F']], how="left", left_on=['A', 'B'], right_on=['A', 'B_new']);