scala 如何将 RDD[Row] 转换为 RDD[String]
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44067476/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert RDD[Row] to RDD[String]
提问by Vickyster
I have a DataFrame called source, a table from mysql
我有一个名为 source 的 DataFrame,一个来自 mysql 的表
val source = sqlContext.read.jdbc(jdbcUrl, "source", connectionProperties)
I have converted it to rdd by
我已将其转换为 rdd
val sourceRdd = source.rdd
but its RDD[Row] I need RDD[String] to do transformations like
但它的 RDD[Row] 我需要 RDD[String] 来做类似的转换
source.map(rec => (rec.split(",")(0).toInt, rec)), .subtractByKey(), etc..
Thank you
谢谢
回答by Haroun Mohammedi
You can use Row. mkString(sep: String): Stringmethod in a mapcall like this :
您可以Row. mkString(sep: String): String在这样的map调用中使用方法:
val sourceRdd = source.rdd.map(_.mkString(","))
You can change the ","parameter by whatever you want.
您可以根据需要更改","参数。
Hope this help you, Best Regards.
希望这对你有帮助,最好的问候。
回答by T. Gaw?da
What is your schema?
你的架构是什么?
If it's just a String, you can use:
如果它只是一个字符串,则可以使用:
import spark.implicits._
val sourceDS = source.as[String]
val sourceRdd = sourceDS.rdd // will give RDD[String]
Note: use sqlContext instead of spark in Spark 1.6 - spark is a SparkSession, which is a new class in Spark 2.0 and is a new entry point to SQL functionality. It should be used instead of SQLContext in Spark 2.x
注意:在 Spark 1.6 中使用 sqlContext 而不是 spark - spark 是 SparkSession,它是 Spark 2.0 中的一个新类,是 SQL 功能的新入口点。在 Spark 2.x 中应该使用它代替 SQLContext
You can also create own case classes.
您还可以创建自己的案例类。
Also you can map rows - here source is of type DataFrame, we use partial function in map function:
您也可以映射行 - 这里 source 是 DataFrame 类型,我们在 map 函数中使用部分函数:
val sourceRdd = source.rdd.map { case x : Row => x(0).asInstanceOf[String] }.map(s => s.split(","))

