scala 如何查询 Spark 数据集的列名?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39578995/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:39:21  来源:igfitidea点击:

How to query the column names of a Spark Dataset?

scalaapache-sparkspark-dataframe

提问by fwc

I have a val ds: Dataset[Double](in Spark 2.0.0), but what is the name of the double-valued column that can be passed to applyor colto convert from this 1-columned Datasetto a Column.

我有一个val ds: Dataset[Double](在 Spark 2.0.0 中),但是可以传递给applycol从这个 1 列Dataset转换为Column.

回答by fwc

The column name is "value" as in ds.col("value"). Dataset.schemacontains this information: ds.schema.fields.foreach(x => println(x))

列名是“值”,如ds.col("value"). Dataset.schema包含以下信息:ds.schema.fields.foreach(x => println(x))

回答by Alberto Bonsanto

You could also use DataFrame's method columns, which returns all columns as an Array of Strings.

您还可以使用DataFrame's 方法columns,它将所有列作为字符串数组返回。

case class Person(age: Int, height: Int, weight: Int){
  def sum = age + height + weight
}

val df = sc.parallelize(List(Person(1,2,3), Person(4,5,6))).toDF("age", "height", "weight")

df.columns
//res0: Array[String] = Array(age, height, weight)