Python pyspark:合并(外连接)两个数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38063657/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:16:20 来源:igfitidea点击:
pyspark: merge (outer-join) two data frames
提问by Edamame
I have the following two data frames:
我有以下两个数据框:
DF1:
Id | field_A | field_B | field_C | field_D
1 | cat | 12 | black | 11
2 | dog | 128 | white | 19
3 | dog | 35 | yellow | 20
4 | dog | 21 | brown | 4
5 | bird | 10 | blue | 7
6 | cow | 99 | brown | 34
and
和
DF2:
Id | field_B | field_C | field_D | field_E
3 | 35 | yellow | 20 | 123
5 | 10 | blue | 7 | 454
6 | 99 | brown | 34 | 398
And I am hoping to get the new_DF as
我希望得到 new_DF 作为
Id | field_A | field_B | field_C | field_D | field_E
1 | cat | 12 | black | 11 |
2 | dog | 128 | white | 19 |
3 | dog | 35 | yellow | 20 | 123
4 | dog | 21 | brown | 4 |
5 | bird | 10 | blue | 7 | 454
6 | cow | 99 | brown | 34 | 398
Could this be achieved by data frame operations? Thanks!
这可以通过数据帧操作来实现吗?谢谢!
回答by MaxU
try this:
尝试这个:
new_df = df1.join(df2, on=['field_B', 'field_C', 'field_D'], how='left_outer')