Scala Spark:如何从字符串列表创建 RDD 并转换为 DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36769169/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:12:06  来源:igfitidea点击:

Scala Spark : How to create a RDD from a list of string and convert to DataFrame

scalaapache-sparkdataframerddunion-all

提问by NehaM

I want to create a DataFrame from a list of string that could match existing schema. Here is my code.

我想从可以匹配现有模式的字符串列表中创建一个 DataFrame。这是我的代码。

    val rowValues = List("ann", "f", "90", "world", "23456") // fails
    val rowValueTuple = ("ann", "f", "90", "world", "23456") //works

    val newRow = sqlContext.sparkContext.parallelize(Seq(rowValueTuple)).toDF(df.columns: _*)

    val newdf = df.unionAll(newRow).show()

The same code fails if i use the List of String. I see the difference is with rowValueTuplea Tupleis created. Since the size of rowValueslist dynamically changes, i cannot manually create Tuple*object. How can i do this? What am i missing? How can i flatten this list to meet the requirement?

如果我使用字符串列表,则相同的代码将失败。我看到不同之处在于创建了rowValueTuple一个Tuple。由于rowValues列表的大小动态变化,我无法手动创建Tuple*对象。我怎样才能做到这一点?我错过了什么?我怎样才能展平这个列表以满足要求?

Appreciate your help, Please.

感谢您的帮助,请。

回答by Vitalii Kotliarenko

DataFrame has schema with fixed number of columns, so it's seems not natural to make row per list of variable length. Anyway, you can create your DataFrame from RDD[Row] using existing schema, like this:

DataFrame 具有固定列数的架构,因此为每个可变长度列表创建行似乎并不自然。无论如何,您可以使用现有架构从 RDD[Row] 创建您的 DataFrame,如下所示:

val rdd = sqlContext.sparkContext.parallelize(Seq(rowValues))
val rowRdd = rdd.map(v => Row(v: _*))
val newRow = sqlContext.createDataFrame(rdd, df.schema)