如何在 spark sql 中解析嵌套的 JSON 对象?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29948789/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 17:52:31  来源:igfitidea点击:

How to parse nested JSON objects in spark sql?

jsonapache-sparkapache-spark-sql

提问by None

I have a schema as shown below. How can i parse the nested objects

我有一个如下所示的架构。我如何解析嵌套对象

root
 |-- apps: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- appName: string (nullable = true)
 |    |    |-- appPackage: string (nullable = true)
 |    |    |-- Ratings: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- date: string (nullable = true)
 |    |    |    |    |-- rating: long (nullable = true)
 |-- id: string (nullable = true)

回答by Vasilis Vagias

Assuming you read in a json file and print the schema you are showing us like this:

假设您读入一个 json 文件并打印您向我们展示的架构,如下所示:

DataFrame df = sqlContext.read().json("/path/to/file").toDF();
    df.registerTempTable("df");
    df.printSchema();

Then you can select nested objects inside a struct type like so...

然后你可以像这样在结构类型中选择嵌套对象......

DataFrame app = df.select("app");
        app.registerTempTable("app");
        app.printSchema();
        app.show();
DataFrame appName = app.select("element.appName");
        appName.registerTempTable("appName");
        appName.printSchema();
        appName.show();

回答by ben jarman

Try this:

尝试这个:

val nameAndAddress = sqlContext.sql("""
    SELECT name, address.city, address.state
    FROM people
""")
nameAndAddress.collect.foreach(println)

Source: https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html

来源:https: //databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html

回答by Adelina Balasa

Have you tried doing it straight from the SQL query like

您是否尝试过直接从 SQL 查询中执行此操作,例如

Select apps.element.Ratings from yourTableName

This will probably return an array and you can more easily access the elements inside. Also, I use this online Json viewer when I have to deal with large JSON structures and the schema is too complex: http://jsonviewer.stack.hu/

这可能会返回一个数组,您可以更轻松地访问其中的元素。另外,当我必须处理大型 JSON 结构并且架构太复杂时,我会使用这个在线 Json 查看器:http: //jsonviewer.stack.hu/

回答by filip stepniak

I am using pyspark, but the logic should be similar. I found this way of parsing my nested json useful:

我正在使用 pyspark,但逻辑应该是相似的。我发现这种解析嵌套 json 的方法很有用:

df.select(df.apps.appName.alias("apps_Name"), \
          df.apps.appPackage.alias("apps_Package"), \
          df.apps.Ratings.date.alias("apps_Ratings_date")) \
   .show()

The code could be obviously shorten with a f-string.

使用 f 字符串显然可以缩短代码。

回答by Ganesh

var df = spark.read.format("json").load("/path/to/file")
df.createOrReplaceTempView("df");
spark.sql("select apps.element.Ratings from df where apps.element.appName like '%app_name%' ").show()