scala 如何将两个RDD合并为一个RDD
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/41120341/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to merge two RDD to one RDD
提问by Simon
Help ,I have two RDDs, i want to merge to one RDD.This is my code.
帮助,我有两个 RDD,我想合并到一个 RDD。这是我的代码。
val us1 = sc.parallelize(Array(("3L"), ("7L"),("5L"),("2L")))
val us2 = sc.parallelize(Array(("432L"), ("7123L"),("513L"),("1312L")))
回答by T. Gaw?da
回答by Indrajit Swain
You need the RDD.unionThese don't join on a key. Union doesn't really do anything itself, so it is low overhead. Note that the combined RDD will have all the partitions of the original RDDs, so you may want to coalesce after the union.
你需要RDD.union这些不要加入一个键。Union 本身并没有真正做任何事情,所以它的开销很低。请注意,合并后的 RDD 将拥有原始 RDD 的所有分区,因此您可能希望在合并后合并。
val x = sc.parallelize(Seq( (1, 3), (2, 4) ))
val y = sc.parallelize(Seq( (3, 5), (4, 7) ))
val z = x.union(y)
z.collect
res0: Array[(Int, Int)] = Array((1,3), (2,4), (3,5), (4,7))
API
应用程序接口
def++(other: RDD[T]): RDD[T]
Return the union of this RDD and another one.
返回此 RDD 和另一个 RDD 的并集。
def union(other: RDD[T]): RDD[T]
Return the union of this RDD and another one. Any identical elements will appear multiple times (use .distinct() to eliminate them).
返回此 RDD 和另一个 RDD 的并集。任何相同的元素都会出现多次(使用 .distinct() 来消除它们)。

