Spark Scala:检索模式并存储它

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37400697/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:19:25  来源:igfitidea点击:

Spark Scala: retrieve the schema and store it

scalaapache-sparkapache-spark-sqlspark-dataframe

提问by Edamame

Is it possible to retrieve the schema of an RDD and store it in a variable? Because I want to create a new data frame from another RDD using the same schema. For example, below is what I am hoping to have:

是否可以检索 RDD 的模式并将其存储在变量中?因为我想使用相同的模式从另一个 RDD 创建一个新的数据框。例如,以下是我希望拥有的:

val schema = oldDF.getSchema()
val newDF = sqlContext.createDataFrame(rowRDD, schema)

Assuming I already have rowRDDin the format of RDD[org.apache.spark.sql.Row], is this something possible?

假设我已经有了rowRDD格式RDD[org.apache.spark.sql.Row],这可能吗?

回答by 5ba86145

Just use schemaattribute

只使用schema属性

val oldDF = sqlContext.createDataFrame(sc.parallelize(Seq(("a", 1))))
val rowRDD = sc.parallelize(Seq(Row("b", 2))

sqlContext.createDataFrame(rowRDD, oldDF.schema)