Map 不能在 Scala 中序列化？

Question

提问by Carter

I am new to Scala. How come the "map" function is not serializable? How to make it serializable? For example, if my code is like below:

我是 Scala 的新手。为什么“地图”功能不可序列化？如何使其可序列化？例如，如果我的代码如下所示：

val data = sc.parallelize(List(1,4,3,5,2,3,5))

def myfunc(iter: Iterator[Int]) : Iterator[Int] = {
  val lst = List(("a", 1),("b", 2),("c",3), ("a",2))
  var res = List[Int]()
  while (iter.hasNext) {
    val cur = iter.next
    val a = lst.groupBy(x => x._1).mapValues(_.size)
    //val b= a.map(x => x._2)
    res = res ::: List(cur)
  }
  res.iterator
}

data.mapPartitions(myfunc).collect

If I uncomment the line

如果我取消注释该行

val b= a.map(x => x._2)

The code returns an exception:

代码返回异常：

org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon
Serialization stack:
    - object not serializable (class: scala.collection.immutable.MapLike$$anon, value: Map(1 -> 3))
    - field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC, name: a, type: interface scala.collection.immutable.Map)

Thank you very much.

非常感谢你。

Answer 1

回答by Eugene Zhulenev

It's well known scala bug: https://issues.scala-lang.org/browse/SI-7005Map#mapValues is not serializable

这是众所周知的 scala 错误：https: //issues.scala-lang.org/browse/SI-7005 Map#mapValues 不可序列化

We have this problem in our Spark apps, map(identity)solves the problem

我们在我们的 Spark 应用程序中有这个问题，map(identity)解决了这个问题

rdd.groupBy(_.segment).mapValues(v => ...).map(identity)

Answer 2

回答by Kshitij Kulshrestha

The actual implementation of mapValues function is provided below and as you can see it is not serializable and creates only a view not a proper existence of data and hence you are getting this error. Situation wise mapValues have many advantages.

下面提供了 mapValues 函数的实际实现，正如您所看到的，它是不可序列化的，并且只创建一个视图，而不是数据的正确存在，因此您会收到此错误。Situation wise mapValues 有很多优点。

protected class MappedValues[C](f: B => C) extends AbstractMap[A, C] with DefaultMap[A, C] {
override def foreach[D](g: ((A, C)) => D): Unit = for ((k, v) <- self) g((k, f(v)))
def iterator = for ((k, v) <- self.iterator) yield (k, f(v))
override def size = self.size
override def contains(key: A) = self.contains(key)
def get(key: A) = self.get(key).map(f)
}

Answer 3

回答by Jason Lenderman

Have you tried running this same code in an application? I suspect this is an issue with the spark shell. If you want to make it work in the spark shell then you might try wrapping the definition of myfuncand its application in curly braces like so:

您是否尝试过在应用程序中运行相同的代码？我怀疑这是火花壳的问题。如果您想让它在 spark shell 中工作，那么您可以尝试将的定义myfunc及其应用程序包装在花括号中，如下所示：

val data = sc.parallelize(List(1,4,3,5,2,3,5))

val result = { 
  def myfunc(iter: Iterator[Int]) : Iterator[Int] = {
    val lst = List(("a", 1),("b", 2),("c",3), ("a",2))
    var res = List[Int]()
    while (iter.hasNext) {
      val cur = iter.next
      val a = lst.groupBy(x => x._1).mapValues(_.size)
      val b= a.map(x => x._2)
      res = res ::: List(cur)
    }
    res.iterator
  }
  data.mapPartitions(myfunc).collect
}

Map 不能在 Scala 中序列化？

提问by Carter

回答by Eugene Zhulenev

回答by Kshitij Kulshrestha

回答by Jason Lenderman

相关推荐

最近更新

标签

Map 不能在 Scala 中序列化？

提问by Carter

回答by Eugene Zhulenev

回答by Kshitij Kulshrestha

回答by Jason Lenderman

相关推荐

scala 如何在 Spark SQL 中为自定义类型定义架构？

Scala 与 Python 的 Spark 性能

Spark：使用 scala 从 s3 读取 csv 文件

scala 如何记录 Akka HTTP 客户端请求

相关推荐

最近更新

标签