scala value join 不是 org.apache.spark.rdd.RDD 的成员

Question

提问by sds

I get this error:

我收到此错误：

value join is not a member of 
    org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0])))
        forSome { type _0 <: (String, Double) }]

The only suggestion I found is import org.apache.spark.SparkContext._I am already doing that.

我发现的唯一建议是import org.apache.spark.SparkContext._我已经在这样做了。

What am I doing wrong?

我究竟做错了什么？

EDIT: changing the code to eliminate forSome(i.e., when the object has type org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[(String, Double)])))) solved the problem. Is this a bug in Spark?

编辑：更改代码以消除forSome（即，当对象具有类型时org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[(String, Double)])))）解决了问题。这是 Spark 中的错误吗？

Answer 1

回答by Daniel Darabos

joinis a member of org.apache.spark.rdd.PairRDDFunctions. So why does the implicit class not trigger?

join是的成员org.apache.spark.rdd.PairRDDFunctions。那么为什么隐式类不触发呢？

scala> val s = Seq[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]()
scala> val r = sc.parallelize(s)
scala> r.join(r) // Gives your error message.
scala> val p = new org.apache.spark.rdd.PairRDDFunctions(r)
<console>:25: error: no type parameters for constructor PairRDDFunctions: (self: org.apache.spark.rdd.RDD[(K, V)])(implicit kt: scala.reflect.ClassTag[K], implicit vt: scala.reflect.ClassTag[V], implicit ord: Ordering[K])org.apache.spark.rdd.PairRDDFunctions[K,V] exist so that it can be applied to arguments (org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }])
 --- because ---
argument expression's type is not compatible with formal parameter type;
 found   : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
 required: org.apache.spark.rdd.RDD[(?K, ?V)]
Note: (Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) } >: (?K, ?V), but class RDD is invariant in type T.
You may wish to define T as -T instead. (SLS 4.5)
       val p = new org.apache.spark.rdd.PairRDDFunctions(r)
               ^
<console>:25: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
 required: org.apache.spark.rdd.RDD[(K, V)]
       val p = new org.apache.spark.rdd.PairRDDFunctions(r)

I'm sure that error message is clear to everyone else, but just for my own slow self let's try to make sense of it. PairRDDFunctionshas two type parameters, Kand V. Your forSomeis for the whole pair, so it cannot be split into separate Kand Vtypes. There are no Kand Vthat RDD[(K, V)]would equal your RDD type.

我相信其他人都清楚错误消息，但为了我自己的缓慢，让我们尝试理解它。PairRDDFunctions有两个类型参数，K和V. 你forSome是整对的，所以它不能分成单独的K和V类型。没有K，V这RDD[(K, V)]将等于您的 RDD 类型。

However, you could have the forSomeonly apply to the key, instead of the whole pair. Join works now, because this type can be separated into Kand V.

但是，您可以将forSome唯一应用于密钥，而不是整个对。Join 现在起作用了，因为这种类型可以分为K和V。

scala> val s2 = Seq[(Long, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) })]()
scala> val r2 = sc.parallelize(2s)
scala> r2.join(r2)
res0: org.apache.spark.rdd.RDD[(Long, ((Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }))] = MapPartitionsRDD[5] at join at <console>:26

Answer 2

回答by V Jaiswal

Consider 2 Spark RDDs to be joined together..

考虑将 2 个 Spark RDD 连接在一起..

Say, rdd1.firstis in the form of (Int, Int, Float) = (1,957,299.98)while rdd2.firstis something like (Int, Int) = (25876,1)where the join is supposed to take place on the 1st field from both the RDDs.

比如说， whilerdd1.first的形式类似于应该在两个 RDD 的第一个字段上进行连接的地方。(Int, Int, Float) = (1,957,299.98)rdd2.first(Int, Int) = (25876,1)

scala> rdd1.join(rdd2) --- results in an error :**: error: value join is not a member of org.apache.spark.rdd.RDD[(Int, Int, Float)]

scala> rdd1.join(rdd2) --- 导致错误 :**: error: value join is not a member of org.apache.spark.rdd.RDD[(Int, Int, Float)]

REASON

原因

Both the RDDs should be in the form of a Key-Value pair.

两个 RDD 都应该采用键值对的形式。

Here, rdd2 -- being in the form of (1,957,299.98) -- does not obey this rule.. While rdd1 -- which is in the form of (25876,1) -- does.

在这里， rdd2——以 (1,957,299.98) 的形式——不遵守这个规则。而 rdd1——以 (25876,1) 的形式——确实如此。

RESOLUTION

解决

Convert the output of the 1st RDD from (1,957,299.98)to a Key-Value pair in the form of (1,(957,299.98))before joining it with rdd2, as shown below:

将第一个RDD的输出从转换为(1,957,299.98)形式的键值对，(1,(957,299.98))然后再与rdd2连接，如下图：

scala> val rdd1KV = rdd1.map(x=>(x.split(",")(1).toInt,(x.split(",")(2).toInt,x.split(",")(4).toFloat))) -- modified RDD

scala> rdd1KV.join(rdd2) -- join successful :)
res**: (Int, (Int, Float)) = (1,(957,299.98))

By the way, join is the member of org.apache.spark.rdd.PairRDDFunctions. So make sure you import this on your Eclipse or IDE, wherever you want to run your code.

顺便说一下，join 是 org.apache.spark.rdd.PairRDDFunctions 的成员。因此，请确保将它导入到 Eclipse 或 IDE 中，无论您想在何处运行代码。

Article also on my blog:

文章也在我的博客上：

https://tips-to-code.blogspot.com/2018/08/apache-spark-error-resolution-value.html

scala value join 不是 org.apache.spark.rdd.RDD 的成员

提问by sds

回答by Daniel Darabos

回答by V Jaiswal

相关推荐

最近更新

标签

scala value join 不是 org.apache.spark.rdd.RDD 的成员

提问by sds

回答by Daniel Darabos

回答by V Jaiswal

相关推荐

scala 找不到密钥 akka 的配置设置

如果键存在或不插入新元素，则在 Scala 映射中向元素添加数字的好方法

Predef.identity 在 Scala 中有什么作用？

scala 有没有办法在 build.sbt 中定义所需的 sbt 版本？

相关推荐

最近更新

标签