scala Spark 的 Column.isin 函数不带 List

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36562678/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:10:04  来源:igfitidea点击:

Spark's Column.isin function does not take List

javascalaapache-spark

提问by Jake Fund

I am trying to filter out rows from my Spark Dataframe.

我正在尝试从 Spark 数据框中过滤掉行。

val sequence = Seq(1,2,3,4,5)
df.filter(df("column").isin(sequence))

Unfortunately, I get an unsupported literal type error

不幸的是,我收到了一个不受支持的文字类型错误

java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(1,2,3,4,5)

according to the documentationit takes a scala.collection.Seq list

根据文档,它需要一个 scala.collection.Seq 列表

I guess I don't want a literal? Then what can I take in, some sort of wrapper class?

我想我不想要文字?那么我可以接受什么,某种包装类?

回答by eliasah

@JustinPihony's answer is correct but it's incomplete. The isinfunction takes a repeated parameterfor argument, so you'll need to pass it as so :

@JustinPihony 的回答是正确的,但不完整。该isin函数采用重复的参数作为参数,因此您需要将其传递如下:

scala> val df = sc.parallelize(Seq(1,2,3,4,5,6,7,8,9)).toDF("column")
// df: org.apache.spark.sql.DataFrame = [column: int]

scala> val sequence = Seq(1,2,3,4,5)
// sequence: Seq[Int] = List(1, 2, 3, 4, 5)

scala> val result = df.filter(df("column").isin(sequence : _*))
// result: org.apache.spark.sql.DataFrame = [column: int]

scala> result.show
// +------+
// |column|
// +------+
// |     1|
// |     2|
// |     3|
// |     4|
// |     5|
// +------+

回答by Justin Pihony

This is happening because the underlying Scala implementation uses varargs, so the documentation in Java is not quite correct. It is using the @varargsannotation, so you can just pass in an array.

发生这种情况是因为底层Scala 实现使用varargs,因此 Java 中的文档不太正确。它正在使用@varargs注释,因此您只需传入一个数组即可。