scala DataFrame 错误:“带有替代方法的重载方法值过滤器”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37334915/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:18:26  来源:igfitidea点击:

DataFrame error: "overloaded method value filter with alternatives"

scalaapache-sparkdataframe

提问by Edamame

I am trying to create a new data frame by filter out the rows which is null or empty string using the code below:

我试图通过使用下面的代码过滤掉空或空字符串的行来创建一个新的数据框:

val df1 = df.filter(df("fieldA") != "").cache()

Then I got the following error:

然后我收到以下错误:

 <console>:32: error: overloaded method value filter with alternatives:
      (conditionExpr: String)org.apache.spark.sql.DataFrame <and>
      (condition: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame
     cannot be applied to (Boolean)
                  val df1 = df.filter(df("fieldA") != "").cache()
                                 ^

Does anyone know what I missed here? Thanks!

有谁知道我在这里错过了什么?谢谢!

回答by Daniel de Paula

In Scala, in order to compare equality column-wise, you should use ===and !==(or =!=in Spark 2.0+):

在 Scala 中,为了按列比较相等性,您应该使用===and !==(或=!=在 Spark 2.0+ 中):

val df1 = df.filter(df("fieldA") !== "").cache()

Alternatively, you can use an expression:

或者,您可以使用表达式:

val df1 = df.filter("fieldA != ''").cache()

Your error happened because the !=operator is present in every Scala object and it's used to compare objects, always returning Boolean. However, the filterfunction expects a Column object or an expression in a String, so there is the !==operator in the Columnclass, which returns another Column and then can be used in the way you want.

您的错误发生是因为该!=运算符存在于每个 Scala 对象中,并且它用于比较对象,始终返回布尔值。但是,该filter函数需要一个 Column 对象或一个 String 中的表达式,因此类中有!==运算符Column,它返回另一个 Column 然后可以按您想要的方式使用。

To see all operations available for columns, the Column scaladocis very useful. Also, there is the functionspackage.

要查看可用于列的所有操作,Column scaladoc非常有用。此外,还有functions包裹。