scala 如何在 Spark SQL 中使用连字符转义列名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30889630/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:15:57  来源:igfitidea点击:

How to escape column names with hyphen in Spark SQL

scalaapache-sparkapache-spark-sql

提问by sfactor

I have imported a json file in Spark and convertd it into a table as

我在 Spark 中导入了一个 json 文件并将其转换为表

myDF.registerTempTable("myDF")

I then want to run SQL queries on this resulting table

然后我想在这个结果表上运行 SQL 查询

val newTable = sqlContext.sql("select column-1 from myDF")

However this gives me an error because of the hypen in the name of the column column-1. How do I resolve this is Spark SQL?

但是,由于列名称中的连字符,这给了我一个错误column-1。我如何解决这是 Spark SQL?

回答by PermaFrost

Backticks (`) appear to work, so

反引号 (`) 似乎有效,所以

val newTable = sqlContext.sql("select `column-1` from myDF")

should do the trick, at least in Spark v1.3.x.

应该可以解决问题,至少在 Spark v1.3.x 中是这样。

回答by GreenThumb

Was at it for a bit yesterday, turns out there is a way to escape the (:) and a (.) like so:

昨天玩了一会儿,结果发现有一种方法可以像这样转义 (:) 和 (.):

Only the field containing (:) needs to be escaped with backticks

只有包含 (:) 的字段需要用反引号转义

sqlc.select("select `sn2:AnyAddRq`.AnyInfo.noInfo.someRef.myInfo.someData.Name AS sn2_AnyAddRq_AnyInfo_noInfo_someRef_myInfo_someData_Name from masterTable").show()

回答by GreenThumb

I cannot comment as I have less than 50 reps

我不能评论,因为我的代表少于 50 次

When you are referencing a json structure with struct.struct.field and there is a namespace present like:

当您使用 struct.struct.field 引用 json 结构并且存在如下命名空间时:

ns2:struct.struct.field the backticks(`) does not work.

ns2:struct.struct.field 反引号(`)不起作用。

jsonDF = sqlc.read.load('jsonMsgs', format="json")
jsonDF.registerTempTable("masterTable")
sqlc.select("select `sn2:AnyAddRq.AnyInfo.noInfo.someRef.myInfo.someData.Name` AS sn2_AnyAddRq_AnyInfo_noInfo_someRef_myInfo_someData_Name from masterTable").show()

pyspark.sql.utils.AnalysisException: u"cannot resolve 'sn2:AnyAddRq.AnyInfo.noInfo.someRef.myInfo.someData.Name'

pyspark.sql.utils.AnalysisException: u"无法解析 ' sn2:AnyAddRq.AnyInfo.noInfo.someRef.myInfo.someData.Name'

If I remove the sn2: fields, the query executes.

如果我删除了 sn2: 字段,查询就会执行。

I have also tried with single quote ('), backslash (\) and double quotes("")

我也试过单引号 (')、反斜杠 (\) 和双引号 ("")

The only way it works if if I register another temp table on the sn2: strucutre, I am able access the fields within it like so

如果我在 sn2: strucutre 上注册另一个临时表,它的唯一工作方式是,我可以像这样访问其中的字段

anotherDF = jsonDF.select("sn2:AnyAddRq.AnyInfo.noInfo.someRef.myInfo.someData")
anotherDF.registerTempTable("anotherDF")
sqlc.select("select Name from anotherDF").show()