scala 使用变量过滤 Spark Dataframe

Question

提问by Goby Bala

Is this even possible in spark dataframe (1.6/2.1)

这甚至可能在火花数据帧（1.6/2.1）中

val data="some variable"

df.filter("column1"> data)

I can do this with a static value but cant figure out how to do filter by a variable.

我可以用静态值来做到这一点，但无法弄清楚如何通过变量进行过滤。

Answer 1

回答by pasha701

import org.apache.spark.sql.functions._

val data="some variable"
df.filter(col("column1") > lit(data))

Answer 2

回答by Vidya

I'm not sure how you accomplished that with a literal either since what you have doesn't match any of the filtermethod signatures.

我不确定你是如何用文字完成的，因为你所拥有的与任何filter方法签名都不匹配。

So yes, you can work with a non-literal, but try this:

所以是的，你可以使用非文字，但试试这个：

import sparkSession.implicits._
df.filter($"column1" > data)

Note the $, which uses implicitconversion to turn the Stringinto the Columnnamed with that String. Meanwhile, this Columnhas a >method that takes an Anyand returns a new Column. That Anywill be your datavalue.

注意$, 它使用implicit转换将 theString转换为Column命名为 that String。同时， thisColumn有一个>接受 anAny并返回一个 new 的方法Column。那Any将是你的data价值。

Answer 3

回答by Satish Karuturi

In Java, we can do like this:

在 Java 中，我们可以这样做：

  int i  =10;

 //for equal condition
  df.select("column1","column2").filter(functions.col("column1").equalTo(i)).show();

 //for greater than or less than
 df.select("no","name").filter(functions.col("no").gt(i)).show();
 df.select("no","name").filter(functions.col("no").lt(i)).show();

Answer 4

回答by Abu Shoeb

Yes, you can use a variable to filter Spark Dataframe.

是的，您可以使用变量来过滤 Spark Dataframe。

val keyword = "my_key_word"
var keyword = "my_key_word" // if it is a variable

df.filter($"column1".contains(keyword))
df.filter(lower($"column1").contains(keyword)) //if not case sensitive

Answer 5

回答by Ram Ghadiyaram

Here is complete demo of filter using <>=on numeric columns where mysearchidis a number declared as valbelow...

这是<>=在数字列上使用过滤器的完整演示，其中mysearchid数字声明val如下...

scala>val numRows =10
scala>val ds = spark.range(0, numRows)
ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]

scala>val df = ds.toDF("index")
df: org.apache.spark.sql.DataFrame = [index: bigint]

scala>df.show
+-----+
|index|
+-----+
|    0|
|    1|
|    2|
|    3|
|    4|
|    5|
|    6|
|    7|
|    8|
|    9|
+-----+


scala>val mysearchid=9
mysearchid: Int = 9

scala>println("filter with less than ")
filter with less than

scala>df.filter(df("index") < mysearchid).show
+-----+
|index|
+-----+
|    0|
|    1|
|    2|
|    3|
|    4|
|    5|
|    6|
|    7|
|    8|
+-----+


scala> println("filter with greater than ")
filter with greater than

scala> df.filter(df("index") > mysearchid).show
+-----+
|index|
+-----+
+-----+


scala> println("filter with equals ")
filter with equals

scala> df.filter(df("index") ===  mysearchid).show
+-----+
|index|
+-----+
|    9|
+-----+

Answer 6

回答by 7kemZmani

you can simply do it using string interpolation

你可以简单地使用字符串插值来做到这一点

val data="some variable"
df.filter(s"column1 > $data")

Answer 7

回答by Rajiv Singh

import org.apache.spark.sql.functions._

val portfolio_name = "Product"

spark.sql("""SELECT
   *
FROM
    Test""").filter($"portfolio_name"===s"$portfolio_name").show(100)

scala 使用变量过滤 Spark Dataframe

提问by Goby Bala

回答by pasha701

回答by Vidya

回答by Satish Karuturi

回答by Abu Shoeb

回答by Ram Ghadiyaram

回答by 7kemZmani

回答by Rajiv Singh

相关推荐

最近更新

标签

scala 使用变量过滤 Spark Dataframe

提问by Goby Bala

回答by pasha701

回答by Vidya

回答by Satish Karuturi

回答by Abu Shoeb

回答by Ram Ghadiyaram

回答by 7kemZmani

回答by Rajiv Singh

相关推荐

scala 如何通过键或过滤器（）使用带有两个 RDD 的火花交叉点（）？

scala Spark：分解结构的数据帧数组并附加 id

scala where子句在spark sql数据框中不起作用

在 Scala 中从 Array[String] 转换为 Seq[String]

相关推荐

最近更新

标签