scala 如何使用字符串数组在火花数据框中将列名设置为 toDF() 函数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37992426/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to set column names to toDF() function in spark dataframe using a string array?
提问by Devi
For example,
例如,
val columns=Array("column1", "column2", "column3")
val df=sc.parallelize(Seq(
(1,"example1", Seq(0,2,5)),
(2,"example2", Seq(1,20,5)))).toDF(columns)
How can I set column name using string Array? Is it possible to mention data types inside toDF()?
如何使用字符串数组设置列名?是否可以在 toDF() 中提及数据类型?
回答by Tzach Zohar
toDF()takes a repeated parameterof type String, so you can use the _*type annotation to pass a sequence:
toDF()接受一个重复的 type参数String,所以你可以使用_*类型注释来传递一个序列:
val df=sc.parallelize(Seq(
(1,"example1", Seq(0,2,5)),
(2,"example2", Seq(1,20,5)))).toDF(columns: _*)
For more on repeated parameters - see section 4.6.2 in the Scala Language Specification.
有关重复参数的更多信息 - 请参阅Scala 语言规范中的第 4.6.2 节。
回答by anshul_cached
val df=sc.parallelize(Seq(
(1,"example1", Seq(0,2,5)),
(2,"example2", Seq(1,20,5)))).toDF("column1", "column2", "column3")
toDF() takes comma-seperated strings
toDF() 采用逗号分隔的字符串
回答by shakedzy
toDF()is defined in Spark documentationas:
toDF()在Spark 文档中定义为:
def toDF(colNames: String*): DataFrame
And so you need to turn your array to a varargsas also described here. That means you need to do the following:
所以你需要把你的阵列的可变参数的描述也是这里。这意味着您需要执行以下操作:
val columns=Array("column1", "column2", "column3")
val df=sc.parallelize(Seq(
(1,"example1", Seq(0,2,5)),
(2,"example2", Seq(1,20,5)))).toDF(columns: _*)
(Add : _* tocolumnsin toDF)
(添加: _* 到columnsin toDF)

