如何为自定义 Java 对象创建编码器?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39188504/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 04:09:36  来源:igfitidea点击:

How to create encoder for custom Java objects?

javaapache-sparkspark-javaapache-spark-2.0

提问by Pradeep

I am using following class to create bean from Spark Encoders

我正在使用以下类从 Spark 编码器创建 bean

Class OuterClass implements Serializable {
    int id;
    ArrayList<InnerClass> listofInner;

    public int getId() {
        return id;
    }

    public void setId (int num) {
        this.id = num;
    }

    public ArrayList<InnerClass> getListofInner() {
        return listofInner;
    }

    public void setListofInner(ArrayList<InnerClass> list) {
        this.listofInner = list;
    }
}

public static class InnerClass implements Serializable {
    String streetno;

    public void setStreetno(String streetno) {
        this.streetno= streetno;
    }

    public String getStreetno() {
        return streetno;
    }
}

Encoder<OuterClass> outerClassEncoder = Encoders.bean(OuterClass.class);
Dataset<OuterClass> ds = spark.createDataset(Collections.singeltonList(outerclassList), outerClassEncoder)

And I am getting the following error

我收到以下错误

Exception in thread "main" java.lang.UnsupportedOperationException: Cannot infer type for class OuterClass$InnerClass because it is not bean-compliant

How can I implement this type of usecase for spark in java? This worked fine if I remove the inner class. But I need to have an inner class for my use case.

如何在 Java 中为 Spark 实现这种类型的用例?如果我删除内部类,这很好用。但是我需要为我的用例创建一个内部类。

采纳答案by abaghel

Your JavaBean class should have a public no-argument constructor, getter and setters and it should implement Serializable interface. Spark SQL works on valid JavaBean class.

您的 JavaBean 类应该有一个公共的无参数构造函数、getter 和 setter,并且它应该实现 Serializable 接口。Spark SQL 适用于有效的 JavaBean 类。

EDIT: Adding working sample with inner class

编辑:使用内部类添加工作示例

OuterInnerDF.java

外部内部DF.java

package com.abaghel.examples;

import java.util.ArrayList;
import java.util.Collections;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoder;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SparkSession;
import com.abaghel.examples.OuterClass.InnerClass;

public class OuterInnerDF {
  public static void main(String[] args) {
    SparkSession spark = SparkSession
            .builder()
            .appName("OuterInnerDF")
            .config("spark.sql.warehouse.dir", "/file:C:/temp")
            .master("local[2]")
            .getOrCreate();

     System.out.println("====> Create DataFrame");
     //Outer
     OuterClass us = new OuterClass();
     us.setId(111);     
     //Inner
     OuterClass.InnerClass ic = new OuterClass.InnerClass();
     ic.setStreetno("My Street");
     //list
     ArrayList<InnerClass> ar = new ArrayList<InnerClass>();
     ar.add(ic);         
     us.setListofInner(ar);  
     //DF
     Encoder<OuterClass> outerClassEncoder = Encoders.bean(OuterClass.class);        
     Dataset<OuterClass> ds = spark.createDataset(Collections.singletonList(us), outerClassEncoder);
     ds.show();
    }
}

OuterClass.java

外部类.java

package com.abaghel.examples;

import java.io.Serializable;
import java.util.ArrayList;

public class OuterClass implements Serializable {
int id;
ArrayList<InnerClass> listofInner;

public int getId() {
    return id;
}

public void setId(int num) {
    this.id = num;
}

public ArrayList<InnerClass> getListofInner() {
    return listofInner;
}

public void setListofInner(ArrayList<InnerClass> list) {
    this.listofInner = list;
}

public static class InnerClass implements Serializable {
    String streetno;

    public void setStreetno(String streetno) {
        this.streetno = streetno;
    }

    public String getStreetno() {
        return streetno;
      }
    }
}

Console Output

控制台输出

====> Create DataFrame
16/08/28 18:02:55 INFO CodeGenerator: Code generated in 32.516369 ms
+---+-------------+
| id|  listofInner|
+---+-------------+
|111|[[My Street]]|
+---+-------------+