带有 Scala 列列表的 Spark 选择
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39909863/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Spark Select with a List of Columns Scala
提问by neuroh
I am trying to find a good way of doing a spark select with a List[Column, I am exploding a column than passing back all the columns I am interested in with my exploded column.
我正在尝试找到一种使用 List[Column 进行火花选择的好方法,我正在分解一列,而不是用分解的列传回我感兴趣的所有列。
var columns = getColumns(x) // Returns a List[Column]
tempDf.select(columns) //trying to get
Trying to find a good way of doing this I know, if it were a string I could do something like
试图找到一个好的方法来做到这一点我知道,如果它是一个字符串,我可以做类似的事情
val result = dataframe.select(columnNames.head, columnNames.tail: _*)
回答by Franzi
For spark 2.0 seems that you have two options. Both depends on how you manage your columns (Strings or Columns).
对于 spark 2.0,您似乎有两种选择。两者都取决于您如何管理列(字符串或列)。
Spark code (spark-sql_2.11/org/apache/spark/sql/Dataset.scala):
Spark 代码(spark-sql_2.11/org/apache/spark/sql/Dataset.scala):
def select(cols: Column*): DataFrame = withPlan {
Project(cols.map(_.named), logicalPlan)
}
def select(col: String, cols: String*): DataFrame = select((col +: cols).map(Column(_)) : _*)
You can see how internally spark is converting your head & tailto a list of Columns to call again Select.
您可以看到 spark 如何在内部将您转换head & tail为要再次调用的 Columns 列表Select。
So, in that case if you want a clear code I will recommend:
所以,在这种情况下,如果你想要一个清晰的代码,我会推荐:
If columns: List[String]:
如果列: List[String]:
import org.apache.spark.sql.functions.col
df.select(columns.map(col): _*)
Otherwise, if columns: List[Columns]:
否则,如果列: List[Columns]:
df.select(columns: _*)

