Scala Spark DataFrame:dataFrame.select 多列给定列名序列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36131716/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:05:58  来源:igfitidea点击:

Scala Spark DataFrame : dataFrame.select multiple columns given a Sequence of column names

scalaapache-sparkdataframeapache-spark-sql

提问by Himaprasoon

val columnName=Seq("col1","col2",....."coln");

Is there a way to do dataframe.select operation to get dataframe containing only the column names specified . I know I can do dataframe.select("col1","col2"...)but the columnNameis generated at runtime. I could do dataframe.select()repeatedly for each column name in a loop.Will it have any performance overheads?. Is there any other simpler way to accomplish this?

有没有办法做 dataframe.select 操作来获取只包含指定列名的数据框。我知道我可以做,dataframe.select("col1","col2"...)但它columnName是在运行时生成的。我可以dataframe.select()对循环中的每个列名重复执行。它会有任何性能开销吗?。有没有其他更简单的方法来实现这一点?

回答by Tzach Zohar

val columnNames = Seq("col1","col2",....."coln")

// using the string column names:
val result = dataframe.select(columnNames.head, columnNames.tail: _*)

// or, equivalently, using Column objects:
val result = dataframe.select(columnNames.map(c => col(c)): _*)

回答by UserszrKs

Since dataFrame.select()expects a sequence of columns and we have a sequence of strings, we need to convert our sequence to a Listof cols and convert that list to the sequence. columnName.map(name => col(name)): _*gives a sequence of columns from a sequence of strings, and this can be passed as a parameter to select():

由于dataFrame.select()需要一个列序列并且我们有一个字符串序列,我们需要将我们的序列转换为 aListcols 并将该列表转换为序列。columnName.map(name => col(name)): _*给出一系列字符串中的一系列列,这可以作为参数传递给select()

  val columnName = Seq("col1", "col2")
  val DFFiltered = DF.select(columnName.map(name => col(name)): _*)