scala 如何使用字符串数组在火花数据框中将列名设置为 toDF() 函数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37992426/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:24:35  来源:igfitidea点击:

How to set column names to toDF() function in spark dataframe using a string array?

scalaapache-spark

提问by Devi

For example,

例如,

val columns=Array("column1", "column2", "column3")
val df=sc.parallelize(Seq(
(1,"example1", Seq(0,2,5)),
(2,"example2", Seq(1,20,5)))).toDF(columns)

How can I set column name using string Array? Is it possible to mention data types inside toDF()?

如何使用字符串数组设置列名?是否可以在 toDF() 中提及数据类型?

回答by Tzach Zohar

toDF()takes a repeated parameterof type String, so you can use the _*type annotation to pass a sequence:

toDF()接受一个重复的 type参数String,所以你可以使用_*类型注释来传递一个序列:

val df=sc.parallelize(Seq(
  (1,"example1", Seq(0,2,5)),
  (2,"example2", Seq(1,20,5)))).toDF(columns: _*)

For more on repeated parameters - see section 4.6.2 in the Scala Language Specification.

有关重复参数的更多信息 - 请参阅Scala 语言规范中的第 4.6.2 节。

回答by anshul_cached

val df=sc.parallelize(Seq(
(1,"example1", Seq(0,2,5)),
(2,"example2", Seq(1,20,5)))).toDF("column1", "column2", "column3")

toDF() takes comma-seperated strings

toDF() 采用逗号分隔的字符串

回答by shakedzy

toDF()is defined in Spark documentationas:

toDF()Spark 文档中定义为:

def toDF(colNames: String*): DataFrame

And so you need to turn your array to a varargsas also described here. That means you need to do the following:

所以你需要把你的阵列的可变参数的描述也是这里。这意味着您需要执行以下操作:

val columns=Array("column1", "column2", "column3")
val df=sc.parallelize(Seq(
(1,"example1", Seq(0,2,5)),
(2,"example2", Seq(1,20,5)))).toDF(columns: _*)

(Add : _* tocolumnsin toDF)

(添加: _* 到columnsin toDF