scala Spark 在 Yarn 集群 exitCode=13 上运行:
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36535411/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Spark runs on Yarn cluster exitCode=13:
提问by user_not_found
I am a spark/yarn newbie, run into exitCode=13 when I submit a spark job on yarn cluster. When the spark job is running in local mode, everything is fine.
我是 spark/yarn 新手,当我在 yarn 集群上提交 spark 作业时遇到 exitCode=13。当 spark 作业在本地模式下运行时,一切正常。
The command I used is:
我使用的命令是:
/usr/hdp/current/spark-client/bin/spark-submit --class com.test.sparkTest --master yarn --deploy-mode cluster --num-executors 40 --executor-cores 4 --driver-memory 17g --executor-memory 22g --files /usr/hdp/current/spark-client/conf/hive-site.xml /home/user/sparkTest.jar*
Spark Error Log:
火花错误日志:
16/04/12 17:59:30 INFO Client:
client token: N/A
diagnostics: Application application_1459460037715_23007 failed 2 times due to AM Container for appattempt_1459460037715_23007_000002 exited with exitCode: 13
For more detailed output, check application tracking page:http://b-r06f2-prod.phx2.cpe.net:8088/cluster/app/application_1459460037715_23007Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e40_1459460037715_23007_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at org.apache.hadoop.util.Shell.run(Shell.java:487)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
**Yarn logs**
16/04/12 23:55:35 INFO mapreduce.TableInputFormatBase: Input split length: 977 M bytes.
16/04/12 23:55:41 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
16/04/12 23:55:51 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
16/04/12 23:56:01 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
16/04/12 23:56:11 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
16/04/12 23:56:11 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x152f0b4fc0e7488
16/04/12 23:56:11 INFO zookeeper.ZooKeeper: Session: 0x152f0b4fc0e7488 closed
16/04/12 23:56:11 INFO zookeeper.ClientCnxn: EventThread shut down
16/04/12 23:56:11 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 2003 bytes result sent to driver
16/04/12 23:56:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 82134 ms on localhost (2/3)
16/04/12 23:56:17 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x4508c270df0980316/04/12 23:56:17 INFO zookeeper.ZooKeeper: Session: 0x4508c270df09803 closed *
...
16/04/12 23:56:21 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application.
16/04/12 23:56:21 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Timed out waiting for SparkContext.)
16/04/12 23:56:21 INFO spark.SparkContext: Invoking stop() from shutdown hook *
回答by user1314742
It seems that you have set the master in your code to be local
似乎您已将代码中的 master 设置为本地
SparkConf.setMaster("local[*]")
SparkConf.setMaster("local[*]")
You have to let the master unset in the code, and set it later when you issue spark-submit
你必须让master在代码中取消设置,然后在你发出的时候设置它 spark-submit
spark-submit --master yarn-client ...
spark-submit --master yarn-client ...
回答by Jhon Mario Lotero
If it helps someone
如果它帮助某人
Another possibility of this error is when you put incorrectly the --classparam
此错误的另一种可能性是当您错误地放置--class参数时
回答by Sahas
I had exactly the same problem but the above answer didn't work.
Alternatively, when I ran this with spark-submit --deploy-mode clienteverything worked fine.
我遇到了完全相同的问题,但上述答案不起作用。或者,当我运行它时spark-submit --deploy-mode client一切正常。
回答by Scrotch
I got this same error running a SparkSQL job in cluster mode. None of the other solutions worked for me but looking in the job history server logs in Hadoop I found this stack trace.
我在集群模式下运行 SparkSQL 作业时遇到了同样的错误。其他解决方案都不适合我,但查看 Hadoop 中的作业历史服务器日志,我发现了此堆栈跟踪。
20/02/05 23:01:24 INFO hive.metastore: Connected to metastore.
20/02/05 23:03:03 ERROR yarn.ApplicationMaster: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run.apply(ApplicationMaster.scala:245)
...
and looking at the Spark source codeyou'll find that basically the AM timed out waiting for the spark.driver.portproperty to be set by the Thread executing the user class.
So it could either be a transient issue or you should investigate your code for the reason for a timeout.
查看Spark 源代码,您会发现基本上 AM 超时,等待spark.driver.port执行用户类的线程设置该属性。
因此,这可能是一个暂时性问题,或者您应该调查您的代码以了解超时的原因。

