scala 使用变量过滤 Spark Dataframe

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43572197/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:12:07  来源:igfitidea点击:

Filter Spark Dataframe with a variable

scalaapache-sparkdataframe

提问by Goby Bala

Is this even possible in spark dataframe (1.6/2.1)

这甚至可能在火花数据帧(1.6/2.1)中

val data="some variable"

df.filter("column1"> data)

I can do this with a static value but cant figure out how to do filter by a variable.

我可以用静态值来做到这一点,但无法弄清楚如何通过变量进行过滤。

回答by pasha701

import org.apache.spark.sql.functions._

val data="some variable"
df.filter(col("column1") > lit(data))

回答by Vidya

I'm not sure how you accomplished that with a literal either since what you have doesn't match any of the filtermethod signatures.

我不确定你是如何用文字完成的,因为你所拥有的与任何filter方法签名都不匹配。

So yes, you can work with a non-literal, but try this:

所以是的,你可以使用非文字,但试试这个:

import sparkSession.implicits._
df.filter($"column1" > data)

Note the $, which uses implicitconversion to turn the Stringinto the Columnnamed with that String. Meanwhile, this Columnhas a >method that takes an Anyand returns a new Column. That Anywill be your datavalue.

注意$, 它使用implicit转换将 theString转换为Column命名为 that String。同时, thisColumn有一个>接受 anAny并返回一个 new 的方法Column。那Any将是你的data价值。

回答by Satish Karuturi

In Java, we can do like this:

在 Java 中,我们可以这样做:

  int i  =10;

 //for equal condition
  df.select("column1","column2").filter(functions.col("column1").equalTo(i)).show();

 //for greater than or less than
 df.select("no","name").filter(functions.col("no").gt(i)).show();
 df.select("no","name").filter(functions.col("no").lt(i)).show();

回答by Abu Shoeb

Yes, you can use a variable to filter Spark Dataframe.

是的,您可以使用变量来过滤 Spark Dataframe。

val keyword = "my_key_word"
var keyword = "my_key_word" // if it is a variable

df.filter($"column1".contains(keyword))
df.filter(lower($"column1").contains(keyword)) //if not case sensitive

回答by Ram Ghadiyaram

Here is complete demo of filter using <>=on numeric columns where mysearchidis a number declared as valbelow...

这是<>=在数字列上使用过滤器的完整演示,其中mysearchid数字声明val如下...

scala>val numRows =10
scala>val ds = spark.range(0, numRows)
ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]

scala>val df = ds.toDF("index")
df: org.apache.spark.sql.DataFrame = [index: bigint]

scala>df.show
+-----+
|index|
+-----+
|    0|
|    1|
|    2|
|    3|
|    4|
|    5|
|    6|
|    7|
|    8|
|    9|
+-----+


scala>val mysearchid=9
mysearchid: Int = 9

scala>println("filter with less than ")
filter with less than

scala>df.filter(df("index") < mysearchid).show
+-----+
|index|
+-----+
|    0|
|    1|
|    2|
|    3|
|    4|
|    5|
|    6|
|    7|
|    8|
+-----+


scala> println("filter with greater than ")
filter with greater than

scala> df.filter(df("index") > mysearchid).show
+-----+
|index|
+-----+
+-----+


scala> println("filter with equals ")
filter with equals

scala> df.filter(df("index") ===  mysearchid).show
+-----+
|index|
+-----+
|    9|
+-----+

回答by 7kemZmani

you can simply do it using string interpolation

你可以简单地使用字符串插值来做到这一点

val data="some variable"
df.filter(s"column1 > $data")

回答by Rajiv Singh

import org.apache.spark.sql.functions._

val portfolio_name = "Product"

spark.sql("""SELECT
   *
FROM
    Test""").filter($"portfolio_name"===s"$portfolio_name").show(100)