Spark java.lang.ClassCastException:scala.collection.mutable.WrappedArray$ofRef 不能转换为 java.util.ArrayList

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40764957/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:51:58  来源:igfitidea点击:

Spark java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to java.util.ArrayList

scalaapache-sparkapache-spark-sqlspark-dataframe

提问by Vivek Narayanasetty

Spark throwing ClassCastExpectionwhen performing any operation on WrappedArray

星火投掷ClassCastExpection执行上WrappedArray任何操作时

Example: I have an map output like below

示例:我有一个如下所示的地图输出

Output:

输出:

Map(1 -> WrappedArray(Pan4), 2 -> WrappedArray(Pan15), 3 -> WrappedArray(Pan16, Pan17, Pan18), 4 -> WrappedArray(Pan19, Pan1, Pan2, Pan3, Pan4, Pan5, Pan6))]

when invoked map.values, its printing output as below output

当调用 map.values 时,其打印输出如下输出

MapLike(WrappedArray(Pan4), WrappedArray(Pan15), WrappedArray(Pan16, Pan17, Pan18), WrappedArray(Pan19, Pan1, Pan2, Pan3, Pan4, Pan5, Pan6))

throwing exception if invoked map.values.map(arr => arr)or map.values.forEach { value => println(value)}

如果调用map.values.map(arr => arr)或抛出异常map.values.forEach { value => println(value)}

I am not able to perform any operation on the wrapped array. I just need the size of the elements present in each wrappedArray

我无法对包装的数组执行任何操作。我只需要每个wrappedArray 中存在的元素的大小

Error StackTrace
------------------
java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to java.util.ArrayList
    at WindowTest$CustomMedian$$anonfun.apply(WindowTest.scala:176)
    at WindowTest$CustomMedian$$anonfun.apply(WindowTest.scala:176)
    at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:244)
    at scala.collection.immutable.Map$Map4.foreach(Map.scala:181)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.AbstractTraversable.map(Traversable.scala:105)
    at WindowTest$CustomMedian.evaluate(WindowTest.scala:176)
    at org.apache.spark.sql.execution.aggregate.ScalaUDAF.eval(udaf.scala:446)
    at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun.apply(AggregationIterator.scala:376)
    at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun.apply(AggregationIterator.scala:368)
    at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:154)
    at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:29)
    at scala.collection.Iterator$$anon.hasNext(Iterator.scala:389)
    at scala.collection.Iterator$$anon.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon.hasNext(Iterator.scala:308)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun.apply(SparkPlan.scala:212)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun.apply(SparkPlan.scala:212)
    at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

回答by Vivek Narayanasetty

Resolved the error by converting to Seq(Sequence Type)

通过转换为Seq(序列类型)解决了错误

Earlier :

早些时候:

val bufferMap: Map[Int, util.ArrayList[String]] = buffer.getAs[Map[Int, util.ArrayList[String]]](1)

Modified :

修改的 :

val bufferMap: Map[Int, Seq[String]] = buffer.getAs[Map[Int, Seq[String]]](1)

回答by Prudvi Sagar

try the below

试试下面的

map.values.**array**.forEach { value => println(value)}

array is method in WrapperArrayit returnas Array[T]. Here T is the type of elements in the WrappedArray

数组是其中的方法WrapperArrayreturnas Array[T]。这里 T 是元素的类型WrappedArray

回答by Tomás Denis Reyes Sánchez

For those using Java Spark encode the dataset into an object instead of using Rowand then getAsmethod.

对于那些使用 Java Spark 的人来说,将数据集编码为一个对象,而不是使用RowthengetAs方法。

Suppose this dataset that has some random information about a machine:

假设这个数据集有一些关于机器的随机信息:

+-----------+------------+------------+-----------+---------+--------------------+
|epoch      |     RValues|     SValues|    TValues|      ids|               codes|
+-----------+------------+------------+-----------+---------+--------------------+
| 1546297225| [-1.0, 5.0]|  [2.0, 6.0]| [3.0, 7.0]|   [2, 3]|[MRT0000020611, M...|
| 1546297226| [-1.0, 3.0]| [-6.0, 6.0]| [3.0, 4.0]|   [2, 3]|[MRT0000020611, M...|
| 1546297227| [-1.0, 4.0]|[-8.0, 10.0]| [3.0, 6.0]|   [2, 3]|[MRT0000020611, M...|
| 1546297228| [-1.0, 6.0]|[-8.0, 11.0]| [3.0, 5.0]|   [2, 3]|[MRT0000020611, M...|
+-----------+------------+------------+-----------+---------+--------------------+

Instead of having Dataset<Row>create Dataset<MachineLog>that complies with this dataset column definition and create the MachineLogclass. When doing a transformation use the .as(Encoders.bean(MachineLog.class))method to define the encoder.

而不是Dataset<Row>创建Dataset<MachineLog>符合此数据集列定义的创建MachineLog类。进行转换时,请使用该.as(Encoders.bean(MachineLog.class))方法来定义编码器。

For example:

例如:

spark.createDataset(dataset.rdd(), Encoders.bean(MachineLog.class));

But converting from a Datasetto RDDis not recommended. Try to use the asmethod.

但是不建议从 a 转换为Datasetto RDD。尝试使用该as方法。

Dataset<MachineLog> mLog = spark.read().parquet("...").as(Encoders.bean(MachineLog.class));

It can also be used after a transformation.

它也可以在转换后使用。

Dataset<MachineLog> machineLogDataset = aDataset
                .join(
                        otherDataset,
                        functions.col("...").eqNullSafe("...")
                        )
                ).as(Encoders.bean(MachineLog.class));

Take into account that MachineLogclass must obey the Serialization rules (ie, having empty-explicit constructor, getters and setters)

考虑到MachineLog该类必须遵守序列化规则(即具有空显式构造函数、getter 和 setter)