pandas 基于多个条件加入两个熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53549492/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Joining two pandas dataframes based on multiple conditions
提问by iprof0214
df_a
and df_b
are two dataframes that looks like following
df_a
并且df_b
是两个数据框,如下所示
df_a
A B C D E
x1 Apple 0.3 0.9 0.6
x1 Orange 0.1 0.5 0.2
x2 Apple 0.2 0.2 0.1
x2 Orange 0.3 0.4 0.9
x2 Mango 0.1 0.2 0.3
x3 Orange 0.3 0.1 0.2
df_b
A B_new F
x1 Apple 0.3
x1 Mango 0.2
x1 Orange 0.1
x2 Apple 0.2
x2 Orange 0.3
x2 Mango 0.1
x3 Orange 0.3
x3 Mango 0.2
x3 Apple 0.1
I want my final_df
to contain all the rows contained in df_a
such that it contemplates the unique combination of df_a['A'] == df_b['A']
and df_a['B'] == df_b['B_new']
.
我希望我的final_df
包含包含在所有的行df_a
,使得它设想的独特组合df_a['A'] == df_b['A']
和df_a['B'] == df_b['B_new']
。
I've tried doing outer join and then drop duplicates w.r.t columns A and B in final_df
but the value of B_new is not retained.
我试过做外连接,然后删除重复的列 A 和 B,final_df
但不保留 B_new 的值。
Following is how I want my result_df
to look like:
以下是我希望我result_df
的样子:
result_df
结果_df
A B C D E B_new F
x1 Apple 0.3 0.9 0.6 Apple 0.3
x1 Orange 0.1 0.5 0.2 Orange 0.1
x2 Apple 0.2 0.2 0.1 Apple 0.2
x2 Orange 0.3 0.4 0.9 Orange 0.3
x2 Mango 0.1 0.2 0.3 Mango 0.1
x3 Orange 0.3 0.1 0.2 Orange 0.3
I also tried left outer join:
我也试过左外连接:
final_df = pd.merge(df_a, df_b, how="left", on=['A'])
The size of this dataframe is a union of df_a
and df_b
which is not what I want.
此数据框的大小是df_a
和 的并集,df_b
这不是我想要的。
Appreciate any suggestions.
感谢任何建议。
采纳答案by jpp
You need an inner merge, specifying bothmerge columns in each case:
您需要内部合并,在每种情况下指定两个合并列:
res = df_a.merge(df_b, how='inner', left_on=['A', 'B'], right_on=['A', 'B_new'])
print(res)
A B C D E B_new F
0 x1 Apple 0.3 0.9 0.6 Apple 0.3
1 x1 Orange 0.1 0.5 0.2 Orange 0.1
2 x2 Apple 0.2 0.2 0.1 Apple 0.2
3 x2 Orange 0.3 0.4 0.9 Orange 0.3
4 x2 Mango 0.1 0.2 0.3 Mango 0.1
5 x3 Orange 0.3 0.1 0.2 Orange 0.3
回答by Daniel
You can still achieve this with a left join which is very ideal.
See below:
您仍然可以使用非常理想的左连接来实现这一点。
见下文:
final_df = pd.merge(df_a, df_b[['A', 'B_new','F']], how="left", left_on=['A', 'B'], right_on=['A', 'B_new']);