scala 相当于 SPARK 中的左外连接

Question

提问by user3279189

Is there a left outer join equivalent in SPARK SCALA ? I understand there is join operation which is equivalent to database inner join.

SPARK SCALA 中是否有等效的左外连接？我知道有连接操作相当于数据库内连接。

Answer 1

回答by MARK

Spark Scala does have the support of left outer join. Have a look here http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.api.java.JavaPairRDD

Spark Scala 确实支持左外连接。看看这里 http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.api.java.JavaPairRDD

Usage is quite simple as

用法很简单，因为

rdd1.leftOuterJoin(rdd2)

Answer 2

回答by Thang Tran

It is as simple as rdd1.leftOuterJoin(rdd2)but you have to make sure both rdd's are in the form of (key, value) for each element of the rdd's.

这很简单，rdd1.leftOuterJoin(rdd2)但您必须确保 rdd 的每个元素都采用 (key, value) 形式。

Answer 3

回答by gaganbm

Yes, there is. Have a look at the DStream APIsand they have provided left as well as right outer joins.

就在这里。查看DStream API，它们提供了左外连接和右外连接。

If you have a stream of of type let's say 'Record', and you wish to join two streams of records, then you can do this like :

如果您有一个类型为“记录”的流，并且您希望加入两个记录流，那么您可以这样做：

var res: DStream[(Long, (Record, Option[Record]))] = left.leftOuterJoin(right)

As the APIs say, the left and right streams have to be hash partitioned. i.e., you can take some attributes from a Record, (or may be in any other way) to calculate a Hash value and convert it to pair DStream. leftand rightstreams will be of type DStream[(Long, Record)]before you call that join function. (It is just an example. The Hash type can be of some type other than Longas well.)

正如 API 所说，左右流必须进行哈希分区。即，您可以从 Record 中获取一些属性（或可能以任何其他方式）来计算 Hash 值并将其转换为对 DStream。left并且在您调用该连接函数之前，right流将是类型的DStream[(Long, Record)]。（这只是一个例子。Hash 类型可以是其他类型Long。）

Answer 4

回答by Tagar

Spark SQL / Data Frame API also supports LEFT/RIGHT/FULL outerjoins directly:

Spark SQL / Data Frame API 也直接支持 LEFT/RIGHT/FULL外连接：

https://spark.apache.org/docs/latest/sql-programming-guide.html

Because of this bug: https://issues.apache.org/jira/browse/SPARK-11111outer joins in Spark prior to 1.6 might be very slow (unless you have really small data sets to join). It used to use cartesian product and then filtering before 1.6. Now it is using SortMergeJoin instead.

由于这个错误：https: //issues.apache.org/jira/browse/SPARK-11111 在 Spark 1.6 之前的外部联接可能会非常慢（除非你有非常小的数据集要联接）。1.6之前是用笛卡尔积再过滤。现在它使用 SortMergeJoin 代替。

scala 相当于 SPARK 中的左外连接

提问by user3279189

回答by MARK

回答by Thang Tran

回答by gaganbm

回答by Tagar

相关推荐

最近更新

标签

scala 相当于 SPARK 中的左外连接

提问by user3279189

回答by MARK

回答by Thang Tran

回答by gaganbm

回答by Tagar

相关推荐

scala Java8 JS Nashorn 将数组转换为 Java 数组

scala 将逻辑和应用于布尔值列表

Scala：隐式传递一个隐式参数，显式传递另一个。是否可以？

Scala Option 的 isDefined 和 nonEmpty 方法之间的区别

相关推荐

最近更新

标签