scala 获取 Spark 数据框列列表

Question

提问by RaAm

How to get all the column names in a spark dataframe into a Seq variable .

如何将 spark 数据框中的所有列名放入 Seq 变量中。

Input Data & Schema

输入数据和架构

val dataset1 = Seq(("66", "a", "4"), ("67", "a", "0"), ("70", "b", "4"), ("71", "d", "4")).toDF("KEY1", "KEY2", "ID")

dataset1.printSchema()
root
|-- KEY1: string (nullable = true)
|-- KEY2: string (nullable = true)
|-- ID: string (nullable = true)

I need to store all the column names in variable using scala programming . I have tried as below , but its not working.

我需要使用 Scala 编程将所有列名存储在变量中。我已经尝试如下，但它不工作。

val selectColumns = dataset1.schema.fields.toSeq

selectColumns: Seq[org.apache.spark.sql.types.StructField] = WrappedArray(StructField(KEY1,StringType,true),StructField(KEY2,StringType,true),StructField(ID,StringType,true))

Expected output:

预期输出：

val selectColumns = Seq(
  col("KEY1"),
  col("KEY2"),
  col("ID")
)

selectColumns: Seq[org.apache.spark.sql.Column] = List(KEY1, KEY2, ID)

Answer 1

回答by Yaron

You can use the following command:

您可以使用以下命令：

val selectColumns = dataset1.columns.toSeq

scala> val dataset1 = Seq(("66", "a", "4"), ("67", "a", "0"), ("70", "b", "4"), ("71", "d", "4")).toDF("KEY1", "KEY2", "ID")
dataset1: org.apache.spark.sql.DataFrame = [KEY1: string, KEY2: string ... 1 more field]

scala> val selectColumns = dataset1.columns.toSeq
selectColumns: Seq[String] = WrappedArray(KEY1, KEY2, ID)

Answer 2

回答by RaAm

val selectColumns = dataset1.columns.toList.map(col(_))

Answer 3

回答by uh_big_mike_boi

I use the columns property like so

我像这样使用 columns 属性

val cols = dataset1.columns.toSeq

and then if you are selecting all the columns later on in the order of the Sequence from head to tail you can use

然后，如果您稍后按照从头到尾的顺序选择所有列，您可以使用

val orderedDF = dataset1.select(cols.head, cols.tail:_ *)

Answer 4

回答by Krishna Reddy

we can get the column names of a dataset / table into a Sequence variable in following ways.

我们可以通过以下方式将数据集/表的列名放入 Sequence 变量中。

from Dataset,

从数据集，

val col_seq:Seq[String] = dataset.columns.toSeq

from table,

从表，

val col_seq:Seq[String] = spark.table("tablename").columns.toSeq
                           or
val col_seq:Seq[String] = spark.catalog.listColumns("tablename").select('name).collect.map(col=>col.toString).toSeq

Answer 5

回答by Abhi

The columns can be fetched from schema too.

列也可以从架构中获取。

val dataset1 = Seq(("66", "a", "4"), ("67", "a", "0"), ("70", "b", "4"), ("71", "d", "4")).toDF("KEY1", "KEY2", "ID")
dataset1.printSchema()
root
 |-- KEY1: string (nullable = true)
 |-- KEY2: string (nullable = true)
 |-- ID: string (nullable = true)

val selectColumns = dataset1.schema.fieldNames
selectColumns: Array[String] = Array(KEY1, KEY2, ID)

val selectColumns2 = dataset1.schema.fieldNames.toSeq 
selectColumns2: Seq[String] = WrappedArray(KEY1, KEY2, ID)

scala 获取 Spark 数据框列列表

提问by RaAm

回答by Yaron

回答by RaAm

回答by uh_big_mike_boi

回答by Krishna Reddy

回答by Abhi

相关推荐

最近更新

标签

scala 获取 Spark 数据框列列表

提问by RaAm

回答by Yaron

回答by RaAm

回答by uh_big_mike_boi

回答by Krishna Reddy

回答by Abhi

相关推荐

在 Scala 中创建 SparkSession 对象以在 unittest 和 spark-submit 中使用的最佳实践

scala 将二进制文件读入 Spark

scala Spark 列字符串在其他列（行）中出现时替换

scala Akka Stream Kafka 与 Kafka Streams

相关推荐

最近更新

标签