scala 获取 Spark 数据框列列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46752273/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fetch Spark dataframe column list
提问by RaAm
How to get all the column names in a spark dataframe into a Seq variable .
如何将 spark 数据框中的所有列名放入 Seq 变量中。
Input Data & Schema
输入数据和架构
val dataset1 = Seq(("66", "a", "4"), ("67", "a", "0"), ("70", "b", "4"), ("71", "d", "4")).toDF("KEY1", "KEY2", "ID")
dataset1.printSchema()
root
|-- KEY1: string (nullable = true)
|-- KEY2: string (nullable = true)
|-- ID: string (nullable = true)
I need to store all the column names in variable using scala programming . I have tried as below , but its not working.
我需要使用 Scala 编程将所有列名存储在变量中。我已经尝试如下,但它不工作。
val selectColumns = dataset1.schema.fields.toSeq
selectColumns: Seq[org.apache.spark.sql.types.StructField] = WrappedArray(StructField(KEY1,StringType,true),StructField(KEY2,StringType,true),StructField(ID,StringType,true))
Expected output:
预期输出:
val selectColumns = Seq(
col("KEY1"),
col("KEY2"),
col("ID")
)
selectColumns: Seq[org.apache.spark.sql.Column] = List(KEY1, KEY2, ID)
回答by Yaron
You can use the following command:
您可以使用以下命令:
val selectColumns = dataset1.columns.toSeq
scala> val dataset1 = Seq(("66", "a", "4"), ("67", "a", "0"), ("70", "b", "4"), ("71", "d", "4")).toDF("KEY1", "KEY2", "ID")
dataset1: org.apache.spark.sql.DataFrame = [KEY1: string, KEY2: string ... 1 more field]
scala> val selectColumns = dataset1.columns.toSeq
selectColumns: Seq[String] = WrappedArray(KEY1, KEY2, ID)
回答by RaAm
val selectColumns = dataset1.columns.toList.map(col(_))
回答by uh_big_mike_boi
I use the columns property like so
我像这样使用 columns 属性
val cols = dataset1.columns.toSeq
and then if you are selecting all the columns later on in the order of the Sequence from head to tail you can use
然后,如果您稍后按照从头到尾的顺序选择所有列,您可以使用
val orderedDF = dataset1.select(cols.head, cols.tail:_ *)
回答by Krishna Reddy
we can get the column names of a dataset / table into a Sequence variable in following ways.
我们可以通过以下方式将数据集/表的列名放入 Sequence 变量中。
from Dataset,
从数据集,
val col_seq:Seq[String] = dataset.columns.toSeq
from table,
从表,
val col_seq:Seq[String] = spark.table("tablename").columns.toSeq
or
val col_seq:Seq[String] = spark.catalog.listColumns("tablename").select('name).collect.map(col=>col.toString).toSeq
回答by Abhi
The columns can be fetched from schema too.
列也可以从架构中获取。
val dataset1 = Seq(("66", "a", "4"), ("67", "a", "0"), ("70", "b", "4"), ("71", "d", "4")).toDF("KEY1", "KEY2", "ID")
dataset1.printSchema()
root
|-- KEY1: string (nullable = true)
|-- KEY2: string (nullable = true)
|-- ID: string (nullable = true)
val selectColumns = dataset1.schema.fieldNames
selectColumns: Array[String] = Array(KEY1, KEY2, ID)
val selectColumns2 = dataset1.schema.fieldNames.toSeq
selectColumns2: Seq[String] = WrappedArray(KEY1, KEY2, ID)

