scala 程序执行期间Apache-Spark中的超时异常

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40740750/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:51:35  来源:igfitidea点击:

Timeout Exception in Apache-Spark during program Execution

scalaapache-sparkspark-graphxapache-spark-2.0

提问by Yasir

I am running a Bash Script in MAC. This script calls a spark method written in Scala language for a large number of times. I am currently trying to call this spark method for 100,000 times using a for loop.

我在 MAC 中运行 Bash 脚本。该脚本多次调用Scala语言编写的spark方法。我目前正在尝试使用 for 循环调用这个 spark 方法 100,000 次。

The code exits with the following exception after running a small number of iterations, around 3000 iterations.

运行少量迭代(大约 3000 次迭代)后,代码退出并出现以下异常。

org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout.applyOrElse(RpcTimeout.scala:63)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout.applyOrElse(RpcTimeout.scala:59)
    at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
    at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
    at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518)
    at org.apache.spark.executor.Executor$$anon$$anonfun$run.apply$mcV$sp(Executor.scala:547)
    at org.apache.spark.executor.Executor$$anon$$anonfun$run.apply(Executor.scala:547)
    at org.apache.spark.executor.Executor$$anon$$anonfun$run.apply(Executor.scala:547)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1877)
    at org.apache.spark.executor.Executor$$anon.run(Executor.scala:547)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)

Exception in thread "dag-scheduler-event-loop" 16/11/22 13:37:32 WARN NioEventLoop: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
    at io.netty.util.internal.MpscLinkedQueue.offer(MpscLinkedQueue.java:126)
    at io.netty.util.internal.MpscLinkedQueue.add(MpscLinkedQueue.java:221)
    at io.netty.util.concurrent.SingleThreadEventExecutor.fetchFromScheduledTaskQueue(SingleThreadEventExecutor.java:259)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:346)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
    at io.netty.util.concurrent.SingleThreadEventExecutor.run(SingleThreadEventExecutor.java:111)
    at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: Java heap space
    at java.util.regex.Pattern.compile(Pattern.java:1047)
    at java.lang.String.replace(String.java:2180)
    at org.apache.spark.util.Utils$.getFormattedClassName(Utils.scala:1728)
    at org.apache.spark.storage.RDDInfo$$anonfun.apply(RDDInfo.scala:57)
    at org.apache.spark.storage.RDDInfo$$anonfun.apply(RDDInfo.scala:57)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.storage.RDDInfo$.fromRdd(RDDInfo.scala:57)
    at org.apache.spark.scheduler.StageInfo$$anonfun.apply(StageInfo.scala:87)

Can someone help please, is this error being caused because of a large number of calls to spark method?

有人可以帮忙吗,这个错误是由于大量调用 spark 方法引起的吗?

回答by Ram Ghadiyaram

Its RpcTimeoutException.. so spark.network.timeout(spark.rpc.askTimeout) could be tuned with larger-than-default values in order to handle complex workload. You can start with these values and adjust accordingly to your workloads. Please see latest

它的RpcTimeoutException.. so spark.network.timeout( spark.rpc.askTimeout) 可以使用大于默认值的值进行调整,以处理复杂的工作负载。您可以从这些值开始,然后根据您的工作负载进行相应调整。请看最新

spark.network.timeout120s Default timeout for all network interactions. This config will be used in place of spark.core.connection.ack.wait.timeout, spark.storage.blockManagerSlaveTimeoutMs, spark.shuffle.io.connectionTimeout, spark.rpc.askTimeout or spark.rpc.lookupTimeout if they are not configured.

spark.network.timeout120 秒所有网络交互的默认超时。如果未配置,此配置将用于代替 spark.core.connection.ack.wait.timeout、spark.storage.blockManagerSlaveTimeoutMs、spark.shuffle.io.connectionTimeout、spark.rpc.askTimeout 或 spark.rpc.lookupTimeout .

Also consider increasing executor memory i.e spark.executor.memoryand most imp thing is review your code, to check whether that is candidate for further optimization.

还要考虑增加执行程序内存,即spark.executor.memory最重要的是检查您的代码,以检查它是否适合进一步优化。

Solution : value 600 is based on requirement

解决方案:值 600 基于需求

set by SparkConf: conf.set("spark.network.timeout", "600s")
set by spark-defaults.conf: spark.network.timeout 600s
set when calling spark-submit: --conf spark.network.timeout=600s

回答by Sandeep Purohit

The above stack trace is also shown java heap space its OOM error so once try to increase the memory and run it and regarding timeout its rpc timeout so you can set spark.network.timeoutwith timeout value according to your need...

上面的堆栈跟踪还显示了 java 堆空间的 OOM 错误,因此一旦尝试增加内存并运行它,以及关于超时其 rpc 超时,您就可以spark.network.timeout根据需要设置超时值...

回答by Prem S

pls increase the executer memory so that OOM will go away else make chnage in code so that your RDDwont have big memory foot print.

请增加执行器内存,以便 OOM 消失,否则请更改代码,这样您就RDD不会占用大量内存。

--executer-memory = 3G

--executer-memory = 3G

回答by Luckylukee

Just increase the spark.executor.heartbeatIntervalto 20s, the error says that.

只是增加到spark.executor.heartbeatInterval20 秒,错误说。

回答by akl

You are seeing this issue due to the executor memory. Try increasing the memory to (x 2) so the containers don't time out while waiting on the remaining containers.

由于执行程序内存,您会看到此问题。尝试将内存增加到 (x 2),以便容器在等待剩余容器时不会超时。