带有 Scala 列列表的 Spark 选择

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39909863/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:42:15  来源:igfitidea点击:

Spark Select with a List of Columns Scala

scalaapache-spark

提问by neuroh

I am trying to find a good way of doing a spark select with a List[Column, I am exploding a column than passing back all the columns I am interested in with my exploded column.

我正在尝试找到一种使用 List[Column 进行火花选择的好方法,我正在分解一列,而不是用分解的列传回我感兴趣的所有列。

var columns = getColumns(x) // Returns a List[Column]
tempDf.select(columns)   //trying to get

Trying to find a good way of doing this I know, if it were a string I could do something like

试图找到一个好的方法来做到这一点我知道,如果它是一个字符串,我可以做类似的事情

val result = dataframe.select(columnNames.head, columnNames.tail: _*)

回答by Franzi

For spark 2.0 seems that you have two options. Both depends on how you manage your columns (Strings or Columns).

对于 spark 2.0,您似乎有两种选择。两者都取决于您如何管理列(字符串或列)。

Spark code (spark-sql_2.11/org/apache/spark/sql/Dataset.scala):

Spark 代码(spark-sql_2.11/org/apache/spark/sql/Dataset.scala):

def select(cols: Column*): DataFrame = withPlan {
  Project(cols.map(_.named), logicalPlan)
}

def select(col: String, cols: String*): DataFrame = select((col +: cols).map(Column(_)) : _*)

You can see how internally spark is converting your head & tailto a list of Columns to call again Select.

您可以看到 spark 如何在内部将您转换head & tail为要再次调用的 Columns 列表Select

So, in that case if you want a clear code I will recommend:

所以,在这种情况下,如果你想要一个清晰的代码,我会推荐:

If columns: List[String]:

如果列: List[String]

import org.apache.spark.sql.functions.col
df.select(columns.map(col): _*)

Otherwise, if columns: List[Columns]:

否则,如果列: List[Columns]

df.select(columns: _*)