如何在 spark sql 中解析嵌套的 JSON 对象?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29948789/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to parse nested JSON objects in spark sql?
提问by None
I have a schema as shown below. How can i parse the nested objects
我有一个如下所示的架构。我如何解析嵌套对象
root
|-- apps: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- appName: string (nullable = true)
| | |-- appPackage: string (nullable = true)
| | |-- Ratings: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- date: string (nullable = true)
| | | | |-- rating: long (nullable = true)
|-- id: string (nullable = true)
回答by Vasilis Vagias
Assuming you read in a json file and print the schema you are showing us like this:
假设您读入一个 json 文件并打印您向我们展示的架构,如下所示:
DataFrame df = sqlContext.read().json("/path/to/file").toDF();
df.registerTempTable("df");
df.printSchema();
Then you can select nested objects inside a struct type like so...
然后你可以像这样在结构类型中选择嵌套对象......
DataFrame app = df.select("app");
app.registerTempTable("app");
app.printSchema();
app.show();
DataFrame appName = app.select("element.appName");
appName.registerTempTable("appName");
appName.printSchema();
appName.show();
回答by ben jarman
Try this:
尝试这个:
val nameAndAddress = sqlContext.sql("""
SELECT name, address.city, address.state
FROM people
""")
nameAndAddress.collect.foreach(println)
Source: https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html
来源:https: //databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html
回答by Adelina Balasa
Have you tried doing it straight from the SQL query like
您是否尝试过直接从 SQL 查询中执行此操作,例如
Select apps.element.Ratings from yourTableName
This will probably return an array and you can more easily access the elements inside. Also, I use this online Json viewer when I have to deal with large JSON structures and the schema is too complex: http://jsonviewer.stack.hu/
这可能会返回一个数组,您可以更轻松地访问其中的元素。另外,当我必须处理大型 JSON 结构并且架构太复杂时,我会使用这个在线 Json 查看器:http: //jsonviewer.stack.hu/
回答by filip stepniak
I am using pyspark, but the logic should be similar. I found this way of parsing my nested json useful:
我正在使用 pyspark,但逻辑应该是相似的。我发现这种解析嵌套 json 的方法很有用:
df.select(df.apps.appName.alias("apps_Name"), \
df.apps.appPackage.alias("apps_Package"), \
df.apps.Ratings.date.alias("apps_Ratings_date")) \
.show()
The code could be obviously shorten with a f-string.
使用 f 字符串显然可以缩短代码。
回答by Ganesh
var df = spark.read.format("json").load("/path/to/file")
df.createOrReplaceTempView("df");
spark.sql("select apps.element.Ratings from df where apps.element.appName like '%app_name%' ").show()

