Java 将 RDD 初始化为空

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33472829/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 14:42:35  来源:igfitidea点击:

Initialize an RDD to empty

javaapache-sparkrdd

提问by Chaitra Bannihatti

I have an RDD called

我有一个 RDD 叫

JavaPairRDD<String, List<String>> existingRDD; 

Now I need to initialize this existingRDDto empty so that when I get the actual rdd's I can do a union with this existingRDD. How do I initialize existingRDDto an empty RDD except initializing it to null? Here is my code:

现在我需要将它初始化 existingRDD为空,以便当我获得实际的 rdd 时,我可以与 this 进行联合existingRDDexistingRDD除了将其初始化为 null 之外,如何初始化为空的 RDD?这是我的代码:

JavaPairRDD<String, List<String>> existingRDD;
if(ai.get()%10==0)
{
    existingRDD.saveAsNewAPIHadoopFile("s3://manthan-impala-test/kinesis-dump/" + startTime + "/" + k + "/" + System.currentTimeMillis() + "/",
    NullWritable.class, Text.class, TextOutputFormat.class); //on worker failure this will get overwritten                                  
}
else
{
    existingRDD.union(rdd);
}

回答by eliasah

To create an empty RDD in Java, you'll just to do the following:

要在Java 中创建一个空的 RDD ,您只需执行以下操作:

// Get an RDD that has no partitions or elements.
JavaSparkContext jsc;
...
JavaRDD<T> emptyRDD = jsc.emptyRDD();

I trust you know how to use generics, otherwise, for your case, you'll need:

我相信您知道如何使用泛型,否则,对于您的情况,您需要:

JavaRDD<Tuple2<String,List<String>>> emptyRDD = jsc.emptyRDD();
JavaPairRDD<String,List<String>> emptyPairRDD = JavaPairRDD.fromJavaRDD(
  existingRDD
);

You can also use the mapToPairmethod to convert your JavaRDDto a JavaPairRDD.

您还可以使用该mapToPair方法将您JavaRDDJavaPairRDD.

In Scala:

斯卡拉

val sc: SparkContext = ???
... 
val emptyRDD = sc.emptyRDD
// emptyRDD: org.apache.spark.rdd.EmptyRDD[Nothing] = EmptyRDD[1] at ...

回答by ???

In scala, I used "parallelize" command.

在 Scala 中,我使用了“并行化”命令。

val emptyRDD = sc.parallelize(Seq(""))

回答by Nikhil Bhide

@eliasah answer is very useful, I am providing code to create empty pair RDD. Consider a scenario in which it is required to create empty pair RDD (key,value). Following scala code illustrates how to create empty pair RDD with key as String and value as Int.

@eliasah 的回答非常有用,我提供了创建空对 RDD 的代码。考虑需要创建空对 RDD (key,value) 的场景。下面的scala代码说明了如何创建一个空对RDD,key为String,value为Int。

type pairRDD = (String,Int)
var resultRDD = sparkContext.emptyRDD[pairRDD]

RDD would be created as follows :

RDD 将按如下方式创建:

resultRDD: org.apache.spark.rdd.EmptyRDD[(String, Int)] = EmptyRDD[0] at emptyRDD at <console>:29

回答by Thiago Mata

In Java, create the empty RDD was a little complex. I tried using the scala.reflect.classTag but it not work either. After many tests, the code that worked was even more simple.

在 Java 中,创建空的 RDD 有点复杂。我尝试使用 scala.reflect.classTag 但它也不起作用。经过多次测试,有效的代码更加简单。

private JavaRDD<Foo> getEmptyJavaRdd() {

/* this code does not compile because require <T> as parameter into emptyRDD */
//        JavaRDD<Foo> emptyRDD = sparkContext.emptyRDD();
//        return emptyRDD;

/* this should be the solution that try to emulate the scala <T> */
/* but i could not make it work too */
//        ClassTag<Foo> tag = scala.reflect.ClassTag$.MODULE$.apply(Foo.class);
//        return sparkContext.emptyRDD(tag);

/* this alternative worked into java 8 */
    return SparkContext.parallelize(
            java.util.Arrays.asList()
    );

}

回答by Thirupathi Chavati

val emptyRdd=sc.emptyRDD[String]

Above statement will create empty RDD with StringType

上面的语句将创建带有String类型的空 RDD

From SparkContext class:

从 SparkContext 类:

Get an RDD that has no partitions or elements

获取一个没有分区或元素的 RDD

def emptyRDD[T: ClassTag]: EmptyRDD[T] = new EmptyRDD[T] (this)

回答by Satya

In Java, create empty pair RDD as follows:

在 Java 中,创建空对 RDD 如下:

JavaPairRDD<T, T> emptyPairRDD = JavaPairRDD.fromJavaRDD(SparkContext.emptyRDD());