scala 在 Spark SQL 中动态绑定变量/参数？

Question

提问by user3769729

How to bind variable in Apache Spark SQL? For example:

如何在 Apache Spark SQL 中绑定变量？例如：

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("SELECT * FROM src WHERE col1 = ${VAL1}").collect().foreach(println)

Answer 1

回答by Tagar

Spark SQL (as of 1.6 release) does not support bind variables.

Spark SQL（从 1.6 版开始）不支持绑定变量。

ps. What Ashrith is suggesting is not a bind variable.. You're constructing a string every time. Evey time Spark will parse the query, create execution plan etc. Purpose of bind variables (in RDBMS systems for example) is to cut time on creating execution plan (which can be costly where there are a lot of joins etc). Spark has to have a special API to "parse" a query and then to "bind" variables. Spark does not have this functionality (as of today, Spark 1.6 release).

附：Ashrith 建议的不是绑定变量。您每次都在构造一个字符串。每一次 Spark 都会解析查询，创建执行计划等。绑定变量的目的（例如在 RDBMS 系统中）是为了减少创建执行计划的时间（如果有很多连接等，这可能会很昂贵）。Spark 必须有一个特殊的 API 来“解析”一个查询，然后“绑定”变量。Spark 没有这个功能（截至今天，Spark 1.6 版本）。

Update 8/2018: as of Spark 2.3 there are (still) no bind variables in Spark.

8/2018 更新：从 Spark 2.3 开始，Spark 中（仍然）没有绑定变量。

Answer 2

回答by mrsrinivas

I verified it in both Spark shell 2.xshell and Thrift(beeline)as well. I could able to bind a variable in Spark SQL query with setcommand.

我也在Spark shell 2.xshell 和Thrift(beeline) 中验证了它。我可以使用set命令在 Spark SQL 查询中绑定一个变量。

Query without bind variable:

没有绑定变量的查询：

select count(1) from mytable;

Query with bind variable (parameterized):

使用绑定变量查询（参数化）：

1. Spark SQL shell

 set key_tbl=mytable; -- setting mytable to key_tbl to use as ${key_tbl}
 select count(1) from ${key_tbl};

2. Spark shell

spark.sql("set key_tbl=mytable")
spark.sql("select count(1) from ${key_tbl}").collect()

1. Spark SQL 外壳

 set key_tbl=mytable; -- setting mytable to key_tbl to use as ${key_tbl}
 select count(1) from ${key_tbl};

2.火花壳

spark.sql("set key_tbl=mytable")
spark.sql("select count(1) from ${key_tbl}").collect()

Both w/w.o bind params the query returns an identical result.

两个 w/wo bind params 查询返回相同的结果。

Note: Don't give any quotes to the value of keyas it's table name here.

注意： 不要给 key 的值加引号，因为它是这里的表名。

Let me know if there are any questions.

如果有任何问题，请告诉我。

Answer 3

回答by Vijay Krishna

Pyspark

派斯帕克

sqlContext.sql("SELECT * FROM src WHERE col1 = {1} and col2 = {2}".format(VAL1,VAL2).collect().foreach(println)

Answer 4

回答by piyushmandovra

Try These

试试这些

sqlContext.sql(s"SELECT * FROM src WHERE col1 = '${VAL1}'").collect().foreach(println)

scala 在 Spark SQL 中动态绑定变量/参数？

提问by user3769729

回答by Tagar

回答by mrsrinivas

回答by Vijay Krishna

回答by piyushmandovra

相关推荐

最近更新

标签

scala 在 Spark SQL 中动态绑定变量/参数？

提问by user3769729

回答by Tagar

回答by mrsrinivas

回答by Vijay Krishna

回答by piyushmandovra

相关推荐

scala 如何取消缓存RDD？

scala 如何通过名称获取 Akka 演员作为 ActorRef？

scala 如何将时间格式化为 UTC 时区？

在交互式 Scala 控制台中打印整个结果

相关推荐

最近更新

标签