scala 如何在RDD中展平列表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28233405/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 06:53:09  来源:igfitidea点击:

How to flatten list inside RDD?

scalaapache-spark

提问by zork

Is it possible to flatten list inside RDD? For example convert:

是否可以在 RDD 中展平列表?例如转换:

 val xxx: org.apache.spark.rdd.RDD[List[Foo]]

to:

到:

 val yyy: org.apache.spark.rdd.RDD[Foo]

How to do this?

这个怎么做?

回答by Shyamendra Solanki

val rdd = sc.parallelize(Array(List(1,2,3), List(4,5,6), List(7,8,9), List(10, 11, 12)))
// org.apache.spark.rdd.RDD[List[Int]] = ParallelCollectionRDD ...

val rddi = rdd.flatMap(list => list)
// rddi: org.apache.spark.rdd.RDD[Int] = FlatMappedRDD ...

// which is same as rdd.flatMap(identity)
// identity is a method defined in Predef object.
//    def identity[A](x: A): A

rddi.collect()
// res2: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

回答by maasg

You just need to flatten it, but as there's no explicit 'flatten' method on RDD, you can do this:

您只需要展平它,但由于 RDD 上没有明确的“展平”方法,您可以这样做:

rdd.flatMap(identity)

回答by Xavier Guihot

You could pimpthe RDDclass to attach a .flattenmethod (in order to follow the Listapi):

你可以拉皮条这个RDD类来附加一个.flatten方法(为了遵循Listapi):

object SparkHelper {
  implicit class SeqRDDExtensions[T: ClassTag](val rdd: RDD[Seq[T]]) {
    def flatten: RDD[T] = rdd.flatMap(identity)
  }
}

which can then simply be used as such:

然后可以简单地使用它:

rdd.flatten