如何将 DataFrame 转换为 Json？

Question

提问by ashish.garg

I have a huge Json file, a small part from it as follows:

我有一个巨大的 Json 文件，其中的一小部分如下：

{
    "socialNews": [{
        "adminTagIds": "",
        "fileIds": "",
        "departmentTagIds": "",
        ........
        ........
        "comments": [{
            "commentId": "",
            "newsId": "",
            "entityId": "",
            ....
            ....
        }]
    }]
    .....
    }

I have applied lateral view exlode on socialNews as follows:

我在socialNews上应用了横向视图exlode如下：

val rdd = sqlContext.jsonFile("file:///home/ashish/test")
rdd.registerTempTable("social")
val result = sqlContext.sql("select * from social LATERAL VIEW explode(socialNews) social AS comment")

Now I want to convert back this result (DataFrame) to json and save into a file, but i am not able to find any scala api to do conversion. Is there any standard library to do this or some way to figure it out?

现在我想将此结果（DataFrame）转换回 json 并保存到文件中，但我找不到任何 scala api 来进行转换。有没有标准库可以做到这一点，或者有什么方法可以解决这个问题？

Answer 1

回答by Nikita

val result: DataFrame = sqlContext.read.json(path)
result.write.json("/yourPath")

The method writeis in the class DataFrameWriterand should be accessible to you on DataFrameobjects. Just make sure that your rdd is of type DataFrameand not of deprecated type SchemaRdd. You can explicitly provide type definition val data: DataFrameor cast to dataFrame with toDF().

该方法write位于类DataFrameWriter 中，您应该可以在DataFrame对象上访问它。只要确保你的 rdd 是 typeDataFrame而不是 deprecated type SchemaRdd。您可以显式提供类型定义val data: DataFrame或使用toDF().

Answer 2

回答by MrChristine

If you have a DataFrame there is an API to convert back to an RDD[String] that contains the json records.

如果您有一个 DataFrame，则有一个 API 可以转换回包含 json 记录的 RDD[String]。

val df = Seq((2012, 8, "Batman", 9.8), (2012, 8, "Hero", 8.7), (2012, 7, "Robot", 5.5), (2011, 7, "Git", 2.0)).toDF("year", "month", "title", "rating")
df.toJSON.saveAsTextFile("/tmp/jsonRecords")
df.toJSON.take(2).foreach(println)

This should be available from Spark 1.4 onward. Call the API on the result DataFrame you created.

这应该从 Spark 1.4 开始可用。在您创建的结果 DataFrame 上调用 API。

The APIs available are listed here

此处列出了可用的 API

Answer 3

回答by abhijitcaps

sqlContext.read().json(dataFrame.toJSON())

Answer 4

回答by Chetan Tamballa

If you still can't figure out a way to convert Dataframe into JSON, you can use to_json or toJSON inbuilt Spark functions.

如果您仍然无法找到将 Dataframe 转换为 JSON 的方法，您可以使用 to_json 或 toJSON 内置 Spark 函数。

Let me know if you have a sample Dataframe and a format of JSON to convert.

如果您有示例数据帧和要转换的 JSON 格式，请告诉我。

Answer 5

回答by Ganesh

When you run your spark job as
--master local --deploy-mode client
Then,
df.write.json('path/to/file/data.json')works.

当你运行你的 spark 作业时
--master local --deploy-mode client
，就
df.write.json('path/to/file/data.json')可以了。

If you run on cluster [on header node], [--master yarn --deploy-mode cluster] better approach is to write data to aws s3 or azure blob and read from it.

如果您在集群 [头节点] 上运行，[ --master yarn --deploy-mode cluster] 更好的方法是将数据写入 aws s3 或 azure blob 并从中读取。

df.write.json('s3://bucket/path/to/file/data.json')works.

df.write.json('s3://bucket/path/to/file/data.json')作品。

如何将 DataFrame 转换为 Json？

提问by ashish.garg

回答by Nikita

回答by MrChristine

回答by abhijitcaps

回答by Chetan Tamballa

回答by Ganesh

相关推荐

最近更新

标签

如何将 DataFrame 转换为 Json？

提问by ashish.garg

回答by Nikita

回答by MrChristine

回答by abhijitcaps

回答by Chetan Tamballa

回答by Ganesh

相关推荐

json 获取不支持的媒体类型错误

Newtonsoft 对象 → 获取 JSON 字符串

用于 Eclipse 的 JSON 格式化程序

ios5中的nsjson序列化？

相关推荐

最近更新

标签