如何修复 java.lang.ClassCastException:无法将 scala.collection.immutable.List 的实例分配给字段类型 scala.collection.Seq?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39953245/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 04:48:10  来源:igfitidea点击:

How to fix java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List to field type scala.collection.Seq?

javaapache-sparkspark-cassandra-connector

提问by user1870400

This error has been the hardest to trace. I am not sure what is going on. I am running a Spark cluster on my location machine. so the entire spark cluster is under one host which is 127.0.0.1and I run on a standalone mode

这个错误是最难追踪的。我不确定发生了什么。我在我的定位机器上运行一个 Spark 集群。所以整个火花集群都在一个主机下127.0.0.1,我在独立模式下运行

JavaPairRDD<byte[], Iterable<CassandraRow>> cassandraRowsRDD= javaFunctions(sc).cassandraTable("test", "hello" )
   .select("rowkey", "col1", "col2", "col3",  )
   .spanBy(new Function<CassandraRow, byte[]>() {
        @Override
        public byte[] call(CassandraRow v1) {
            return v1.getBytes("rowkey").array();
        }
    }, byte[].class);

Iterable<Tuple2<byte[], Iterable<CassandraRow>>> listOftuples = cassandraRowsRDD.collect(); //ERROR HAPPENS HERE
Tuple2<byte[], Iterable<CassandraRow>> tuple = listOftuples.iterator().next();
byte[] partitionKey = tuple._1();
for(CassandraRow cassandraRow: tuple._2()) {
    System.out.println("************START************");
    System.out.println(new String(partitionKey));
    System.out.println("************END************");
}

This error has been the hardest to trace. It clearly happens at cassandraRowsRDD.collect()and I dont know why?

这个错误是最难追踪的。它显然发生在cassandraRowsRDD.collect(),我不知道为什么?

16/10/09 23:36:21 ERROR Executor: Exception in task 2.3 in stage 0.0 (TID 21)
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:85)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Here are the versions I use

这是我使用的版本

Scala code runner version 2.11.8  // when I run scala -version or even ./spark-shell


compile group: 'org.apache.spark' name: 'spark-core_2.11' version: '2.0.0'
compile group: 'org.apache.spark' name: 'spark-streaming_2.11' version: '2.0.0'
compile group: 'org.apache.spark' name: 'spark-sql_2.11' version: '2.0.0'
compile group: 'com.datastax.spark' name: 'spark-cassandra-connector_2.11' version: '2.0.0-M3': 

my gradle file looks like this after introducing something called "provided" which actually doesn't seem to exist but google said to create one so my build.gradle looks like this

我的 gradle 文件在引入了一个叫做“提供”的东西之后看起来像这样,它实际上似乎并不存在,但谷歌说要创建一个,所以我的 build.gradle 看起来像这样

group 'com.company'
version '1.0-SNAPSHOT'

apply plugin: 'java'
apply plugin: 'idea'

repositories {
    mavenCentral()
    mavenLocal()
}

configurations {
    provided
}
sourceSets {
    main {
        compileClasspath += configurations.provided
        test.compileClasspath += configurations.provided
        test.runtimeClasspath += configurations.provided
    }
}

idea {
    module {
        scopes.PROVIDED.plus += [ configurations.provided ]
    }
}

dependencies {
    compile 'org.slf4j:slf4j-log4j12:1.7.12'
    provided group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.0.0'
    provided group: 'org.apache.spark', name: 'spark-streaming_2.11', version: '2.0.0'
    provided group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.0.0'
    provided group: 'com.datastax.spark', name: 'spark-cassandra-connector_2.11', version: '2.0.0-M3'
}



jar {
    from { configurations.provided.collect { it.isDirectory() ? it : zipTree(it) } }
   // with jar
    from sourceSets.test.output
    manifest {
        attributes 'Main-Class': "com.company.batchprocessing.Hello"
    }
    exclude 'META-INF/.RSA', 'META-INF/.SF', 'META-INF/*.DSA'
    zip64 true
}

回答by Holger Brandl

I had the same issue and could resolve it by adding my application's jar to spark's classpath with

我遇到了同样的问题,可以通过将我的应用程序的 jar 添加到 spark 的类路径来解决它

spark = SparkSession.builder()
        .appName("Foo")
        .config("spark.jars", "target/scala-2.11/foo_2.11-0.1.jar")

回答by Ambling

I have hit the same exception and have dig into multiple related Jiras (9219, 12675, 18075).

我已经打了同样的异常,并有挖成多个相关Jiras(92191267518075)。

I believe that the exception name is confusing, and the real problem is the inconsistent environment settingsbetween the spark cluster and the driver application.

我认为异常名称令人困惑,真正的问题是spark集群和驱动程序应用程序之间的环境设置不一致

For example, I started my Spark cluster with the following line in conf/spark-defaults.conf:

例如,我使用以下行启动了我的 Spark 集群conf/spark-defaults.conf

spark.master                     spark://master:7077

while I started my driver program (even the program is started with spark-submit) with a line:

当我spark-submit用一行启动我的驱动程序(甚至程序以 开头)时:

sparkSession.master("spark://<master ip>:7077")

in which the <master ip>is the correct IP address of the node master, but the program would fail due to this simple inconsistency.

其中<master ip>是节点的正确 IP 地址master,但由于这种简单的不一致,程序将失败。

As a result, I would recommend that all driver applications are started with spark-submitand do not duplicate any configuration in the driver code (unless you need to override some config). Namely, just let the spark-submitset your environment with the same way in the running Spark cluster.

因此,我建议所有驱动程序应用程序都启动spark-submit并且不要复制驱动程序代码中的任何配置(除非您需要覆盖某些配置)。也就是说,只需spark-submit在运行的 Spark 集群中以相同的方式设置您的环境即可。

回答by abaghel

You call() method should return byte[] like below.

你 call() 方法应该像下面一样返回 byte[] 。

@Override
public byte[] call(CassandraRow v1) {
  return v1.getBytes("rowkey").array();
}

If you still get the issue then check the versions of your dependencies as mentioned in Jira https://issues.apache.org/jira/browse/SPARK-9219

如果您仍然遇到问题,请检查 Jira https://issues.apache.org/jira/browse/SPARK-9219 中提到的依赖项版本

回答by Nikita Bosik

In my case I had to add spark-avrojar (I put it into /libfolder next to main jar):

就我而言,我必须添加spark-avrojar(我将其放入/lib主 jar 旁边的文件夹中):

SparkSession spark = SparkSession.builder().appName("myapp").getOrCreate();
...
spark.sparkContext().addJar("lib/spark-avro_2.11-4.0.0.jar");

回答by Valeriy K.

Check you code - In Intellij: Analyze... -> Inspect code. If you have deprecated methods related to serialisation fix it. Or simply try to reduce Spark o Scala version. In my case I reduce Scala version to 2.10 and all worked.

检查代码 - 在 Intellij 中:分析... -> 检查代码。如果您已弃用与序列化相关的方法,请修复它。或者干脆尝试减少 Spark o Scala 版本。就我而言,我将 Scala 版本减少到 2.10 并且一切正常。

回答by shiyong

try don't use .master("spark://hadoop001:7077") and use .master("local[2]") solved my problem

尝试不要使用 .master("spark://hadoop001:7077") 并使用 .master("local[2]") 解决了我的问题