java+spark：org.apache.spark.SparkException：作业中止：任务不可序列化：java.io.NotSerializableException

Question

提问by user2810081

I'm new to spark, and was trying to run the example JavaSparkPi.java, it runs well, but because i have to use this in another java s I copy all things from main to a method in the class and try to call the method in main, it saids

我是 spark 新手，并试图运行示例 JavaSparkPi.java，它运行良好，但是因为我必须在另一个 java 中使用它，所以我将所有内容从 main 复制到类中的一个方法并尝试调用主要方法，它说

org.apache.spark.SparkException: Job aborted: Task not serializable: java.io.NotSerializableException

org.apache.spark.SparkException：作业中止：任务不可序列化：java.io.NotSerializableException

the code looks like this:

代码如下所示：

public class JavaSparkPi {

public void cal(){
    JavaSparkContext jsc = new JavaSparkContext("local", "JavaLogQuery");
    int slices = 2;
    int n = 100000 * slices;

    List<Integer> l = new ArrayList<Integer>(n);
    for (int i = 0; i < n; i++) {
        l.add(i);
    }

    JavaRDD<Integer> dataSet = jsc.parallelize(l, slices);

    System.out.println("count is: "+ dataSet.count());
    dataSet.foreach(new VoidFunction<Integer>(){
        public void call(Integer i){
            System.out.println(i);
        }
    });

    int count = dataSet.map(new Function<Integer, Integer>() {
        @Override
        public Integer call(Integer integer) throws Exception {
            double x = Math.random() * 2 - 1;
            double y = Math.random() * 2 - 1;
            return (x * x + y * y < 1) ? 1 : 0;
        }
    }).reduce(new Function2<Integer, Integer, Integer>() {
        @Override
        public Integer call(Integer integer, Integer integer2) throws Exception {
            return integer + integer2;
        }
    });

    System.out.println("Pi is roughly " + 4.0 * count / n);
}

public static void main(String[] args) throws Exception {

    JavaSparkPi myClass = new JavaSparkPi();
    myClass.cal();
}
}

anyone have idea on this? thanks!

有人对此有想法吗？谢谢！

Answer 1

采纳答案by Daniel Darabos

The nested functions hold a reference to the containing object (JavaSparkPi). So this object will get serialized. For this to work, it needs to be serializable. Simple to do:

嵌套函数持有对包含对象 ( JavaSparkPi)的引用。所以这个对象将被序列化。为此，它需要可序列化。做起来很简单：

public class JavaSparkPi implements Serializable {
  ...

Answer 2

回答by Anuj J

The main problem is that when you create an Anonymous Class in java it is passed a reference of the enclosing class. This can be fixed in many ways

主要问题是，当您在 java 中创建匿名类时，它会传递一个封闭类的引用。这可以通过多种方式修复

Declare the enclosing class Serializable

声明封闭类 Serializable

This works in your case but will fall flat in case your enclosing class has some field that is not serializable. I would also say that serializing the parent class is a total waste.

这适用于您的情况，但如果您的封闭类有一些不可序列化的字段，则会失败。我还要说序列化父类完全是浪费。

Create the Closure in a static function

在静态函数中创建闭包

Creating the closure by invoking some static function doesn't pass the reference to the closure and hence no need to make serializable this way.

通过调用一些静态函数来创建闭包不会将引用传递给闭包，因此不需要以这种方式进行序列化。

Answer 3

回答by Jeetu

This error comes because you have multiple physical CPUs in your local or cluster and spark engine try to send this function to multiple CPUs over network. Your function

出现此错误是因为您的本地或集群中有多个物理 CPU，而 Spark 引擎尝试通过网络将此功能发送到多个 CPU。你的职能

 dataSet.foreach(new VoidFunction<Integer>(){
        public void call(Integer i){
            ***System.out.println(i);***
        }
    });

uses println() which is not serialize. So the exception thrown by Spark Engine. The solution is you can use below:

使用非序列化的 println()。所以Spark Engine抛出的异常。解决方案是您可以在下面使用：

dataSet.collect().forEach(new VoidFunction<Integer>(){
       public void call(Integer i){
         System.out.println(i);
    }
});

java+spark：org.apache.spark.SparkException：作业中止：任务不可序列化：java.io.NotSerializableException

提问by user2810081

采纳答案by Daniel Darabos

回答by Anuj J

Declare the enclosing class Serializable

声明封闭类 Serializable

Create the Closure in a static function

在静态函数中创建闭包

回答by Jeetu

相关推荐

最近更新

标签

java+spark：org.apache.spark.SparkException：作业中止：任务不可序列化：java.io.NotSerializableException

提问by user2810081

采纳答案by Daniel Darabos

回答by Anuj J

Declare the enclosing class Serializable

声明封闭类 Serializable

Create the Closure in a static function

在静态函数中创建闭包

回答by Jeetu

相关推荐

Java 使用 Spark Web 应用程序框架时出现“Unsupported major.minor version 52.0”异常

Java Spring-Batch bean 的“步骤”或“作业”范围？

Java 根据jstl中的索引获取arraylist的元素

Java 安装 Mac OS X Yosemite (Mac OS 10.10) 后 Eclipse 无法启动

相关推荐

最近更新

标签