scala Spark：分解结构的数据帧数组并附加 id

Question

提问by Steve

I currently have a dataframe with an id and a column which is an array of structs:

我目前有一个带有 id 和列的数据框，它是一个结构数组：

 root
 |-- id: integer (nullable = true)
 |-- lists: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- text: string (nullable = true)
 |    |    |-- amount: double (nullable = true)

Here is an example table with data:

这是一个包含数据的示例表：

 id | lists
 -----------
 1  | [[a, 1.0], [b, 2.0]]
 2  | [[c, 3.0]]

How do I transform the above dataframe to the one below? I need to "explode" the array and append the id at the same time.

如何将上面的数据框转换为下面的数据框？我需要“分解”数组并同时附加 id。

 id | col1  | col2
 -----------------
 1  | a     | 1.0
 1  | b     | 2.0
 2  | c     | 3.0

Edited Note:

编辑注：

Note there is a difference between the two examples below. The first one contains "an array of structs of elements". While the later just contains "an array of elements".

请注意，以下两个示例之间存在差异。第一个包含“元素结构数组”。而后者只包含“元素数组”。

 root
 |-- id: integer (nullable = true)
 |-- lists: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- text: string (nullable = true)
 |    |    |-- amount: double (nullable = true)


root
 |-- a: long (nullable = true)
 |-- b: array (nullable = true)
 |    |-- element: long (containsNull = true)

Answer 1

回答by user7595317

explodeis exactly the function:

explode正是这个功能：

import org.apache.spark.sql.functions._

df.select($"id", explode($"lists")).select($"id", $"col.text", $"col.amount")

scala Spark：分解结构的数据帧数组并附加 id

提问by Steve

回答by user7595317

相关推荐

最近更新

标签

scala Spark：分解结构的数据帧数组并附加 id

提问by Steve

回答by user7595317

相关推荐

scala HDFS 文件系统的 URL

scala 如何从 Spark ml lib 中的交叉验证中获得准确率、召回率和 ROC？

scala Spark Dataframe 更改列值

scala 在 Spark 中将 BigInt 转换为 Int

相关推荐

最近更新

标签