Java 为什么启动 StreamingContext 失败并显示“IllegalArgumentException:要求失败:没有注册输出操作,所以没有执行”?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24519660/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 12:43:22  来源:igfitidea点击:

Why does starting StreamingContext fail with “IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute”?

javaapache-sparkspark-streaming

提问by Ananth Duari

I'm trying to execute a Spark Streaming example with Twitter as the source as follows:

我正在尝试使用 Twitter 作为源执行 Spark Streaming 示例,如下所示:

public static void main (String.. args) {

    SparkConf conf = new SparkConf().setAppName("Spark_Streaming_Twitter").setMaster("local");
        JavaSparkContext sc = new JavaSparkContext(conf);       
        JavaStreamingContext jssc = new JavaStreamingContext(sc, new Duration(2));      
        JavaSQLContext sqlCtx = new JavaSQLContext(sc);     


        String[] filters = new String[] {"soccer"};

        JavaReceiverInputDStream<Status> receiverStream = TwitterUtils.createStream(jssc,filters);



         jssc.start();
         jssc.awaitTermination();

}

But I'm getting the following exception

但我收到以下异常

Exception in thread "main" java.lang.AssertionError: assertion failed: No output streams registered, so nothing to execute
    at scala.Predef$.assert(Predef.scala:179)
    at org.apache.spark.streaming.DStreamGraph.validate(DStreamGraph.scala:158)
    at org.apache.spark.streaming.StreamingContext.validate(StreamingContext.scala:416)
    at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:437)
    at org.apache.spark.streaming.api.java.JavaStreamingContext.start(JavaStreamingContext.scala:501)
    at org.learning.spark.TwitterStreamSpark.main(TwitterStreamSpark.java:53)

Any suggestion how to fix this issue?

任何建议如何解决这个问题?

采纳答案by Jigar Parekh

When an output operator is called, it triggers the computation of a stream.

当一个输出操作符被调用时,它会触发一个流的计算。

Without output operator on DStream no computation is invoked. basically you will need to invoke any of below method on stream

如果 DStream 上没有输出运算符,则不会调用任何计算。基本上你需要在流中调用以下任何方法

print()
foreachRDD(func)
saveAsObjectFiles(prefix, [suffix])
saveAsTextFiles(prefix, [suffix])
saveAsHadoopFiles(prefix, [suffix])

http://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operations

http://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operations

you can also first apply any transformations and then output functions too if required.

如果需要,您还可以先应用任何转换,然后输出函数。

回答by user1995400

It also -wrongly- fails accusing this problem, but the real causeis the non multiple numbersbetween the slide window durations from streaming input and the RDD time windows. It only logs a warning: you fix it, and the context stops failing :D

它也 -错误地- 未能指责这个问题,但真正的原因是来自流输入和 RDD 时间窗口的滑动窗口持续时间之间的非多个数字它只记录一个警告:你修复它,上下文停止失败:D

回答by Jacek Laskowski

Exception in thread "main" java.lang.AssertionError: assertion failed: No output streams registered, so nothing to execute

线程“main”中的异常java.lang.AssertionError:断言失败:没有注册输出流,所以没有执行

TL;DRUse one of the available output operatorslike print, saveAsTextFilesor foreachRDD(or less often used saveAsObjectFilesor saveAsHadoopFiles).

TL;DR使用可用的输出运算符之一,例如print, saveAsTextFilesor foreachRDD(或不太常用的saveAsObjectFilesor saveAsHadoopFiles)。

In other words, you have to use an output operator between the following lines in your code:

换句话说,您必须在代码中的以下行之间使用输出运算符:

JavaReceiverInputDStream<Status> receiverStream = TwitterUtils.createStream(jssc,filters);
// --> The output operator here <--
jssc.start();

Quoting the Spark official documentation's Output Operations on DStreams(highlighting mine):

引用 Spark 官方文档的DStreams 输出操作(突出显示我的):

Output operationsallow DStream's data to be pushed out to external systems like a database or a file systems. Since the output operations actually allow the transformed data to be consumed by external systems, they trigger the actual execution of all the DStream transformations (similar to actions for RDDs).

输出操作允许将 DStream 的数据推送到外部系统,如数据库或文件系统。由于输出操作实际上允许外部系统使用转换后的数据,因此它们会触发所有 DStream 转换的实际执行(类似于 RDD 的操作)

The point is that without an output operator you have "no output streams registered, so nothing to execute".

关键是,如果没有输出操作符,您将“没有注册的输出流,所以没有任何东西可以执行”

As one commenter has noticed, you have to use an output transformation, e.g. printor foreachRDD, before starting the StreamingContext.

正如一位评论者注意到,你必须使用一个输出转变,例如print或者foreachRDD,在开始之前StreamingContext



Internally, whenever you use one of the available output operators, e.g. printor foreach, DStreamGraphis requested to add an output stream.

在内部,每当您使用可用的输出运算符之一(例如print或 )时foreachDStreamGraph都会请求添加输出流

You can find the registration when a new ForEachDStream is created and registeredafterwards (which is exactly to add it as an output stream).

您可以在创建新的 ForEachDStream 并在之后注册时找到注册(这正是将其添加为输出流)。