如何在连接中选择数据帧的所有列 - Spark-scala
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37780748/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to select all columns of a dataframe in join - Spark-scala
提问by user2895589
I am doing join of 2 data frames and select all columns of left frame for example:
我正在连接 2 个数据框并选择左框的所有列,例如:
val join_df = first_df.join(second_df, first_df("id") === second_df("id") , "left_outer")
in above I want to do select first_df.* .How can I select all columns of one frame in join ?
在上面我想做 select first_df.* 。如何在 join 中选择一个框架的所有列?
回答by
With alias:
使用别名:
first_df.alias("fst").join(second_df, Seq("id"), "left_outer").select("fst.*")
回答by Bryan Johnson
Suppose you:
假设你:
- Want to use the DataFrame syntax.
- Want to select all columns from df1 but only a couple from df2.
- This is cumbersome to list out explicitly due to the number of columns in df1.
- 想使用 DataFrame 语法。
- 想要从 df1 中选择所有列,但只从 df2 中选择几列。
- 由于 df1 中的列数,要明确列出很麻烦。
Then, you might do the following:
然后,您可能会执行以下操作:
val selectColumns = df1.columns.map(df1(_)) ++ Array(df2("field1"), df2("field2"))
df1.join(df2, df1("key") === df2("key")).select(selectColumns:_*)
回答by Keshav Prashanth
We can also do it with leftsemi join. leftsemi join will select the data from left side dataframe from a joined dataframe.
我们也可以使用 leftsemi join 来做到这一点。leftsemi join 将从连接的数据框中选择左侧数据框中的数据。
Here we join two dataframes df1 and df2 based on column col1.
在这里,我们基于列 col1 连接了两个数据帧 df1 和 df2。
df1.join(df2, df1.col("col1").equalTo(df2.col("col1")), "leftsemi")

