在 Scala 中将 RDD 映射到 PairRDD
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30655914/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
map RDD to PairRDD in Scala
提问by Edamame
I am trying to map RDD to pairRDD in scala, so I could use reduceByKey later. Here is what I did:
我正在尝试将 RDD 映射到 Scala 中的 pairRDD,因此我可以稍后使用 reduceByKey。这是我所做的:
userRecords is of org.apache.spark.rdd.RDD[UserElement]
userRecords 属于 org.apache.spark.rdd.RDD[UserElement]
I try to create a pairRDD from userRecords like below:
我尝试从 userRecords 创建一个pairRDD,如下所示:
val userPairs: PairRDDFunctions[String, UserElement] = userRecords.map { t =>
val nameKey: String = t.getName()
(nameKey, t)
}
However, I got the error:
但是,我得到了错误:
type mismatch; found : org.apache.spark.rdd.RDD[(String, com.mypackage.UserElement)] required: org.apache.spark.rdd.PairRDDFunctions[String,com.mypackage.UserElement]
类型不匹配; 发现:org.apache.spark.rdd.RDD[(String, com.mypackage.UserElement)] 需要:org.apache.spark.rdd.PairRDDFunctions[String,com.mypackage.UserElement]
What am I missing here? Thanks a lot!
我在这里错过了什么?非常感谢!
采纳答案by marios
I think you are just missing the import to org.apache.spark.SparkContext._. This brings all the right implicit conversions in scope to create the PairRDD.
我认为您只是缺少导入到org.apache.spark.SparkContext._. 这将所有正确的隐式转换带入创建 PairRDD 的范围内。
The example below should work (assuming you have initialized a SparkContext under sc):
下面的例子应该可以工作(假设你已经在 sc 下初始化了一个 SparkContext):
import org.apache.spark.SparkContext._
val f = sc.parallelize(Array(1,2,3,4,5))
val g: PairRDDFunctions[String, Int] = f.map( x => (x.toString, x))
回答by Justin Pihony
You don't need to do that as it is done via implicits(explicitly rddToPairRDDFunctions). Any RDD that is of type Tuple2[K,V]can automatically be used as a PairRDDFunctions. If you REALLY want to, you can explicitly do what the implicitdoes and wrap the RDD in a PairRDDFunction:
您不需要这样做,因为它是通过隐式(显式rddToPairRDDFunctions)完成的。任何类型的 RDD 都Tuple2[K,V]可以自动用作PairRDDFunctions. 如果您真的想要,您可以明确地执行该implicit操作并将 RDD 包装在 a 中PairRDDFunction:
val pair = new PairRDDFunctions(rdd)
回答by Srini
You can also use keyBy method, you need to provide the key in the function,
也可以使用keyBy方法,需要在函数中提供key,
in your example, you can simply give userRecords.keyBy(t => t.getName())
在你的例子中,你可以简单地给 userRecords.keyBy(t => t.getName())

