在 Java Spark 中将 RDD 转换为数据集
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45326796/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
convert RDD to Dataset in Java Spark
提问by vdep
I have an RDD, i need to convert it into a Dataset, i tried:
我有一个 RDD,我需要把它转换成一个数据集,我试过:
Dataset<Person> personDS = sqlContext.createDataset(personRDD, Encoders.bean(Person.class));
the above line throws the error,
上面的行抛出错误,
cannot resolve method createDataset(org.apache.spark.api.java.JavaRDD Main.Person, org.apache.spark.sql.Encoder T)
无法解析方法 createDataset(org.apache.spark.api.java.JavaRDD Main.Person, org.apache.spark.sql.Encoder T)
however, i can convert to Dataset
after converting to Dataframe
. the below code works:
但是,我可以Dataset
在转换为Dataframe
. 以下代码有效:
Dataset<Row> personDF = sqlContext.createDataFrame(personRDD, Person.class);
Dataset<Person> personDS = personDF.as(Encoders.bean(Person.class));
回答by vdep
.createDataset()
accepts RDD<T>
not JavaRDD<T>
. JavaRDD
is a wrapper around RDD inorder to make calls from java code easier. It contains RDD internally and can be accessed using .rdd()
. The following can create a Dataset
:
.createDataset()
RDD<T>
不接受JavaRDD<T>
。JavaRDD
是 RDD 的包装器,以便更轻松地从 Java 代码调用。它内部包含 RDD,可以使用.rdd()
. 以下可以创建一个Dataset
:
Dataset<Person> personDS = sqlContext.createDataset(personRDD.rdd(), Encoders.bean(Person.class));
回答by Chitral Verma
on your rdd use .toDS()
you will get a dataset.
在您使用 rdd 时,.toDS()
您将获得一个数据集。
Let me know if it helps. Cheers.
如果有帮助,请告诉我。干杯。
回答by Manishankar Singh
In addition to accepted answer, if you want to create a Dataset<Row>
instead of Dataset<Person>
in Java, please try like this:
除了接受的答案之外,如果您想在 Java 中创建一个Dataset<Row>
而不是Dataset<Person>
,请尝试这样:
StructType yourStruct = ...; //Create your own structtype based on individual field types
Dataset<Row> personDS = sqlContext.createDataset(personRDD.rdd(), RowEncoder.apply(yourStruct));