scala 在 Spark SQL 中动态绑定变量/参数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26755230/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
dynamically bind variable/parameter in Spark SQL?
提问by user3769729
How to bind variable in Apache Spark SQL? For example:
如何在 Apache Spark SQL 中绑定变量?例如:
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("SELECT * FROM src WHERE col1 = ${VAL1}").collect().foreach(println)
回答by Tagar
Spark SQL (as of 1.6 release) does not support bind variables.
Spark SQL(从 1.6 版开始)不支持绑定变量。
ps. What Ashrith is suggesting is not a bind variable.. You're constructing a string every time. Evey time Spark will parse the query, create execution plan etc. Purpose of bind variables (in RDBMS systems for example) is to cut time on creating execution plan (which can be costly where there are a lot of joins etc). Spark has to have a special API to "parse" a query and then to "bind" variables. Spark does not have this functionality (as of today, Spark 1.6 release).
附:Ashrith 建议的不是绑定变量。您每次都在构造一个字符串。每一次 Spark 都会解析查询,创建执行计划等。绑定变量的目的(例如在 RDBMS 系统中)是为了减少创建执行计划的时间(如果有很多连接等,这可能会很昂贵)。Spark 必须有一个特殊的 API 来“解析”一个查询,然后“绑定”变量。Spark 没有这个功能(截至今天,Spark 1.6 版本)。
Update 8/2018: as of Spark 2.3 there are (still) no bind variables in Spark.
8/2018 更新:从 Spark 2.3 开始,Spark 中(仍然)没有绑定变量。
回答by mrsrinivas
I verified it in both Spark shell 2.xshell and Thrift(beeline)as well. I could able to bind a variable in Spark SQL query with setcommand.
我也在Spark shell 2.xshell 和Thrift(beeline) 中验证了它。我可以使用set命令在 Spark SQL 查询中绑定一个变量。
Query without bind variable:
没有绑定变量的查询:
select count(1) from mytable;
Query with bind variable (parameterized):
使用绑定变量查询(参数化):
1. Spark SQL shell
set key_tbl=mytable; -- setting mytable to key_tbl to use as ${key_tbl} select count(1) from ${key_tbl};2. Spark shell
spark.sql("set key_tbl=mytable") spark.sql("select count(1) from ${key_tbl}").collect()
1. Spark SQL 外壳
set key_tbl=mytable; -- setting mytable to key_tbl to use as ${key_tbl} select count(1) from ${key_tbl};2.火花壳
spark.sql("set key_tbl=mytable") spark.sql("select count(1) from ${key_tbl}").collect()
Both w/w.o bind params the query returns an identical result.
两个 w/wo bind params 查询返回相同的结果。
Note: Don't give any quotes to the value of keyas it's table name here.
注意: 不要给 key 的值加引号,因为它是这里的表名。
Let me know if there are any questions.
如果有任何问题,请告诉我。
回答by Vijay Krishna
Pyspark
派斯帕克
sqlContext.sql("SELECT * FROM src WHERE col1 = {1} and col2 = {2}".format(VAL1,VAL2).collect().foreach(println)
回答by piyushmandovra
Try These
试试这些
sqlContext.sql(s"SELECT * FROM src WHERE col1 = '${VAL1}'").collect().foreach(println)

