scala 如何在 Spark 中声明一个空数据集?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46296442/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:26:39  来源:igfitidea点击:

How to declare an empty dataset in Spark?

scalaapache-sparkapache-spark-sql

提问by Hassan Ali

I am new in Spark and Spark dataset. I was trying to declare an empty dataset using emptyDatasetbut it was asking for org.apache.spark.sql.Encoder. The data type I am using for the dataset is an object of case class Tp(s1: String, s2: String, s3: String).

我是 Spark 和 Spark 数据集的新手。我试图使用声明一个空的数据集,emptyDataset但它要求org.apache.spark.sql.Encoder. 我用于数据集的数据类型是case class Tp(s1: String, s2: String, s3: String).

采纳答案by Vitalii Kotliarenko

All you need is to import implicit encoders from SparkSession instance before you create empty Dataset: import spark.implicits._See full example here

您只需要在创建空数据集之前从 SparkSession 实例导入隐式编码器:import spark.implicits._请参阅此处的完整示例

回答by Hassan Ali

EmptyDataFrame

空数据框

package com.examples.sparksql

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object EmptyDataFrame {

  def main(args: Array[String]){

    //Create Spark Conf
    val sparkConf = new SparkConf().setAppName("Empty-Data-Frame").setMaster("local")

    //Create Spark Context - sc
    val sc = new SparkContext(sparkConf)

    //Create Sql Context
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)    

    //Import Sql Implicit conversions
    import sqlContext.implicits._
    import org.apache.spark.sql.Row
    import org.apache.spark.sql.types.{StructType,StructField,StringType}   

    //Create Schema RDD
    val schema_string = "name,id,dept"
    val schema_rdd = StructType(schema_string.split(",").map(fieldName => StructField(fieldName, StringType, true)) )

    //Create Empty DataFrame
    val empty_df = sqlContext.createDataFrame(sc.emptyRDD[Row], schema_rdd)

    //Some Operations on Empty Data Frame
    empty_df.show()
    println(empty_df.count())     

    //You can register a Table on Empty DataFrame, it's empty table though
    empty_df.registerTempTable("empty_table")

    //let's check it ;)
    val res = sqlContext.sql("select * from empty_table")
    res.show

  }

}