scala 将 RDD 转换为 JSON 对象
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30758105/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert RDD to JSON Object
提问by Vamsi
I have an RDD of type RDD[(String, List[String])].
我有一个 RDD[(String, List[String])] 类型的 RDD。
Example:
例子:
(FRUIT, List(Apple,Banana,Mango))
(VEGETABLE, List(Potato,Tomato))
I want to convert the above output to json object like below.
我想将上面的输出转换为 json 对象,如下所示。
{
"categories": [
{
"name": "FRUIT",
"nodes": [
{
"name": "Apple",
"isInTopList": false
},
{
"name": "Banana",
"isInTopList": false
},
{
"name": "Mango",
"isInTopList": false
}
]
},
{
"name": "VEGETABLE",
"nodes": [
{
"name": "POTATO",
"isInTopList": false
},
{
"name": "TOMATO",
"isInTopList": false
},
]
}
]
}
Please suggest the best possible way to do it.
请建议最好的方法来做到这一点。
NOTE: "isInTopList": falseis always constant and has to be there with every item in the jsonobject.
注意:"isInTopList": false始终不变,并且必须与 jsonobject 中的每个项目一起存在。
回答by NRR
First I used the following code to reproduce the scenario that you mentioned:
首先,我使用以下代码重现您提到的场景:
val sampleArray = Array(
("FRUIT", List("Apple", "Banana", "Mango")),
("VEGETABLE", List("Potato", "Tomato")))
val sampleRdd = sc.parallelize(sampleArray)
sampleRdd.foreach(println) // Printing the result
Now, I am using json4sScala library to convert this RDD into the JSON structure that you requested:
现在,我使用json4sScala 库将此 RDD 转换为您请求的 JSON 结构:
import org.json4s.native.JsonMethods._
import org.json4s.JsonDSL.WithDouble._
val json = "categories" -> sampleRdd.collect().toList.map{
case (name, nodes) =>
("name", name) ~
("nodes", nodes.map{
name => ("name", name)
})
}
println(compact(render(json))) // Printing the rendered JSON
The result is:
结果是:
{"categories":[{"name":"FRUIT","nodes":[{"name":"Apple"},{"name":"Banana"},{"name":"Mango"}]},{"name":"VEGETABLE","nodes":[{"name":"Potato"},{"name":"Tomato"}]}]}
回答by Daniel Langdon
Since you want a single JSON for you entire RDD, I would start by doing Rdd.collect. Be careful that your set fits in memory, as this will move the data back to the driver.
由于您需要为整个 RDD 使用单个 JSON,因此我将从执行Rdd.collect. 请注意您的设置是否适合内存,因为这会将数据移回驱动程序。
To get the json, just use a library to traverse your objects. I like Json4s due to its simple internal structure and practical, clean operators. Here is a sample from their website that shows how to traverse nested structures (in particular, lists):
要获取 json,只需使用库来遍历您的对象。我喜欢 Json4s,因为它的内部结构简单,操作符简洁实用。这是他们网站上的一个示例,展示了如何遍历嵌套结构(特别是列表):
object JsonExample extends App {
import org.json4s._
import org.json4s.JsonDSL._
import org.json4s.Hymanson.JsonMethods._
case class Winner(id: Long, numbers: List[Int])
case class Lotto(id: Long, winningNumbers: List[Int], winners: List[Winner], drawDate: Option[java.util.Date])
val winners = List(Winner(23, List(2, 45, 34, 23, 3, 5)), Winner(54, List(52, 3, 12, 11, 18, 22)))
val lotto = Lotto(5, List(2, 45, 34, 23, 7, 5, 3), winners, None)
val json =
("lotto" ->
("lotto-id" -> lotto.id) ~
("winning-numbers" -> lotto.winningNumbers) ~
("draw-date" -> lotto.drawDate.map(_.toString)) ~
("winners" ->
lotto.winners.map { w =>
(("winner-id" -> w.id) ~
("numbers" -> w.numbers))}))
println(compact(render(json)))
}

