在 Java Spark 中将 RDD 转换为数据集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45326796/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 08:38:31  来源:igfitidea点击:

convert RDD to Dataset in Java Spark

javaapache-spark

提问by vdep

I have an RDD, i need to convert it into a Dataset, i tried:

我有一个 RDD,我需要把它转换成一个数据集,我试过:

Dataset<Person> personDS =  sqlContext.createDataset(personRDD, Encoders.bean(Person.class));

the above line throws the error,

上面的行抛出错误,

cannot resolve method createDataset(org.apache.spark.api.java.JavaRDD Main.Person, org.apache.spark.sql.Encoder T)

无法解析方法 createDataset(org.apache.spark.api.java.JavaRDD Main.Person, org.apache.spark.sql.Encoder T)

however, i can convert to Datasetafter converting to Dataframe. the below code works:

但是,我可以Dataset在转换为Dataframe. 以下代码有效:

Dataset<Row> personDF = sqlContext.createDataFrame(personRDD, Person.class);
Dataset<Person> personDS = personDF.as(Encoders.bean(Person.class));

回答by vdep

.createDataset()accepts RDD<T>not JavaRDD<T>. JavaRDDis a wrapper around RDD inorder to make calls from java code easier. It contains RDD internally and can be accessed using .rdd(). The following can create a Dataset:

.createDataset()RDD<T>不接受JavaRDD<T>JavaRDD是 RDD 的包装器,以便更轻松地从 Java 代码调用。它内部包含 RDD,可以使用.rdd(). 以下可以创建一个Dataset

Dataset<Person> personDS =  sqlContext.createDataset(personRDD.rdd(), Encoders.bean(Person.class));

回答by Chitral Verma

on your rdd use .toDS()you will get a dataset.

在您使用 rdd 时,.toDS()您将获得一个数据集。

Let me know if it helps. Cheers.

如果有帮助,请告诉我。干杯。

回答by Manishankar Singh

In addition to accepted answer, if you want to create a Dataset<Row>instead of Dataset<Person>in Java, please try like this:

除了接受的答案之外,如果您想在 Java 中创建一个Dataset<Row>而不是Dataset<Person>,请尝试这样:

StructType yourStruct = ...; //Create your own structtype based on individual field types
Dataset<Row> personDS =  sqlContext.createDataset(personRDD.rdd(), RowEncoder.apply(yourStruct));