scala 在 Spark 中展平行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32906613/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:40:58 来源:igfitidea点击:
Flattening Rows in Spark
提问by Nir Ben Yaacov
I am doing some testing for spark using scala. We usually read json files which needs to be manipulated like the following example:
我正在使用 Scala 对火花进行一些测试。我们通常读取需要像以下示例一样操作的 json 文件:
test.json:
测试.json:
{"a":1,"b":[2,3]}
val test = sqlContext.read.json("test.json")
How can I convert it to the following format:
如何将其转换为以下格式:
{"a":1,"b":2}
{"a":1,"b":3}
回答by zero323
You can use explodefunction:
您可以使用explode功能:
scala> import org.apache.spark.sql.functions.explode
import org.apache.spark.sql.functions.explode
scala> val test = sqlContext.read.json(sc.parallelize(Seq("""{"a":1,"b":[2,3]}""")))
test: org.apache.spark.sql.DataFrame = [a: bigint, b: array<bigint>]
scala> test.printSchema
root
|-- a: long (nullable = true)
|-- b: array (nullable = true)
| |-- element: long (containsNull = true)
scala> val flattened = test.withColumn("b", explode($"b"))
flattened: org.apache.spark.sql.DataFrame = [a: bigint, b: bigint]
scala> flattened.printSchema
root
|-- a: long (nullable = true)
|-- b: long (nullable = true)
scala> flattened.show
+---+---+
| a| b|
+---+---+
| 1| 2|
| 1| 3|
+---+---+

