Scala Spark：如何从字符串列表创建 RDD 并转换为 DataFrame

Question

提问by NehaM

I want to create a DataFrame from a list of string that could match existing schema. Here is my code.

我想从可以匹配现有模式的字符串列表中创建一个 DataFrame。这是我的代码。

    val rowValues = List("ann", "f", "90", "world", "23456") // fails
    val rowValueTuple = ("ann", "f", "90", "world", "23456") //works

    val newRow = sqlContext.sparkContext.parallelize(Seq(rowValueTuple)).toDF(df.columns: _*)

    val newdf = df.unionAll(newRow).show()

The same code fails if i use the List of String. I see the difference is with rowValueTuplea Tupleis created. Since the size of rowValueslist dynamically changes, i cannot manually create Tuple*object. How can i do this? What am i missing? How can i flatten this list to meet the requirement?

如果我使用字符串列表，则相同的代码将失败。我看到不同之处在于创建了rowValueTuple一个Tuple。由于rowValues列表的大小动态变化，我无法手动创建Tuple*对象。我怎样才能做到这一点？我错过了什么？我怎样才能展平这个列表以满足要求？

Appreciate your help, Please.

感谢您的帮助，请。

Answer 1

回答by Vitalii Kotliarenko

DataFrame has schema with fixed number of columns, so it's seems not natural to make row per list of variable length. Anyway, you can create your DataFrame from RDD[Row] using existing schema, like this:

DataFrame 具有固定列数的架构，因此为每个可变长度列表创建行似乎并不自然。无论如何，您可以使用现有架构从 RDD[Row] 创建您的 DataFrame，如下所示：

val rdd = sqlContext.sparkContext.parallelize(Seq(rowValues))
val rowRdd = rdd.map(v => Row(v: _*))
val newRow = sqlContext.createDataFrame(rdd, df.schema)

Scala Spark：如何从字符串列表创建 RDD 并转换为 DataFrame

提问by NehaM

回答by Vitalii Kotliarenko

相关推荐

最近更新

标签

Scala Spark：如何从字符串列表创建 RDD 并转换为 DataFrame

提问by NehaM

回答by Vitalii Kotliarenko

相关推荐

Scala Spark DataFrame：dataFrame.select 多列给定列名序列

Scala 使用 nscala-time 获取当前时间毫秒

scala SBT 程序集不起作用（不是有效的命令）

scala 如何获得两个DataFrame之间的对称差异？

相关推荐

最近更新

标签