scala 从 Spark SQL 中的字符串列表创建文字和列数组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35419307/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:01:14  来源:igfitidea点击:

Create array of literals and columns from List of Strings in Spark SQL

arraysscalaapache-sparkapache-spark-sql

提问by Benji Kok

I am trying to define functions in Scala that take a list of strings as input, and converts them into the columns passed to the dataframe array arguments used in the code below.

我试图在 Scala 中定义将字符串列表作为输入的函数,并将它们转换为传递给以下代码中使用的数据帧数组参数的列。

val df = sc.parallelize(Array((1,1),(2,2),(3,3))).toDF("foo","bar")
val df2 = df
        .withColumn("columnArray",array(df("foo").cast("String"),df("bar").cast("String")))
        .withColumn("litArray",array(lit("foo"),lit("bar")))

More specifically, I would like to create functions colFunctionand litFunction(or just one function if possible) that takes a list of strings as an input parameter and can be used as follows:

更具体地说,我想创建将字符串列表作为输入参数的函数colFunctionlitFunction(如果可能的话,只创建一个函数),可以按如下方式使用:

val df = sc.parallelize(Array((1,1),(2,2),(3,3))).toDF("foo","bar")
val colString = List("foo","bar")
val df2 = df
         .withColumn("columnArray",array(colFunction(colString))
         .withColumn("litArray",array(litFunction(colString)))

I have tried mapping the colStringto an Array of columns with all the transformations but this doesn't work. Any ideas on how this can be achieved? Many thanks for reading the question, and for any suggestions/solutions.

我尝试将 映射colString到包含所有转换的列数组,但这不起作用。关于如何实现这一目标的任何想法?非常感谢您阅读这个问题,以及任何建议/解决方案。

回答by zero323

Spark 2.2+:

火花 2.2+

Support for Seq, Mapand Tuple(struct) literals has been added in SPARK-19254. According to tests:

SPARK-19254 中添加了对Seq,MapTuple( struct) 文字的支持。根据测试

import org.apache.spark.sql.functions.typedLit

typedLit(Seq("foo", "bar"))

Spark < 2.2

火花 < 2.2

Just mapwith litand wrap with array:

只需map使用lit并包装array

def asLitArray[T](xs: Seq[T]) = array(xs map lit: _*)

df.withColumn("an_array", asLitArray(colString)).show
// +---+---+----------+
// |foo|bar|  an_array|
// +---+---+----------+
// |  1|  1|[foo, bar]|
// |  2|  2|[foo, bar]|
// |  3|  3|[foo, bar]|
// +---+---+----------+

Regarding transformation from Seq[String]to Columnof type Arraythis functionality is already provided by:

关于从Seq[String]toColumn类型的转换,Array此功能已由以下提供:

def array(colName: String, colNames: String*): Column 

or

或者

def array(cols: Column*): Column

Example:

例子:

val cols = Seq("bar", "foo")

cols match { case x::xs => df.select(array(x, xs:_*)) 
// or 
df.select(array(cols map col: _*))

Of course all columns have to be of the same type.

当然,所有的列都必须是相同的类型。