scala 在 Spark 中展平行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32906613/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:40:58  来源:igfitidea点击:

Flattening Rows in Spark

scalaapache-sparkapache-spark-sqldistributed-computing

提问by Nir Ben Yaacov

I am doing some testing for spark using scala. We usually read json files which needs to be manipulated like the following example:

我正在使用 Scala 对火花进行一些测试。我们通常读取需要像以下示例一样操作的 json 文件:

test.json:

测试.json:

{"a":1,"b":[2,3]}
val test = sqlContext.read.json("test.json")

How can I convert it to the following format:

如何将其转换为以下格式:

{"a":1,"b":2}
{"a":1,"b":3}

回答by zero323

You can use explodefunction:

您可以使用explode功能:

scala> import org.apache.spark.sql.functions.explode
import org.apache.spark.sql.functions.explode


scala> val test = sqlContext.read.json(sc.parallelize(Seq("""{"a":1,"b":[2,3]}""")))
test: org.apache.spark.sql.DataFrame = [a: bigint, b: array<bigint>]

scala> test.printSchema
root
 |-- a: long (nullable = true)
 |-- b: array (nullable = true)
 |    |-- element: long (containsNull = true)

scala> val flattened = test.withColumn("b", explode($"b"))
flattened: org.apache.spark.sql.DataFrame = [a: bigint, b: bigint]

scala> flattened.printSchema
root
 |-- a: long (nullable = true)
 |-- b: long (nullable = true)

scala> flattened.show
+---+---+
|  a|  b|
+---+---+
|  1|  2|
|  1|  3|
+---+---+