scala Spark 在 Yarn 集群 exitCode=13 上运行：

Question

提问by user_not_found

I am a spark/yarn newbie, run into exitCode=13 when I submit a spark job on yarn cluster. When the spark job is running in local mode, everything is fine.

我是 spark/yarn 新手，当我在 yarn 集群上提交 spark 作业时遇到 exitCode=13。当 spark 作业在本地模式下运行时，一切正常。

The command I used is:

我使用的命令是：

/usr/hdp/current/spark-client/bin/spark-submit --class com.test.sparkTest --master yarn --deploy-mode cluster --num-executors 40 --executor-cores 4 --driver-memory 17g --executor-memory 22g --files /usr/hdp/current/spark-client/conf/hive-site.xml /home/user/sparkTest.jar*

Spark Error Log:

火花错误日志：

16/04/12 17:59:30 INFO Client:
         client token: N/A
         diagnostics: Application application_1459460037715_23007 failed 2 times due to AM Container for appattempt_1459460037715_23007_000002 exited with  exitCode: 13
For more detailed output, check application tracking page:http://b-r06f2-prod.phx2.cpe.net:8088/cluster/app/application_1459460037715_23007Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e40_1459460037715_23007_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
        at org.apache.hadoop.util.Shell.run(Shell.java:487)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)


**Yarn logs**

    16/04/12 23:55:35 INFO mapreduce.TableInputFormatBase: Input split length: 977 M bytes.
16/04/12 23:55:41 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
16/04/12 23:55:51 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
16/04/12 23:56:01 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
16/04/12 23:56:11 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
16/04/12 23:56:11 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x152f0b4fc0e7488
16/04/12 23:56:11 INFO zookeeper.ZooKeeper: Session: 0x152f0b4fc0e7488 closed
16/04/12 23:56:11 INFO zookeeper.ClientCnxn: EventThread shut down
16/04/12 23:56:11 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 2003 bytes result sent to driver
16/04/12 23:56:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 82134 ms on localhost (2/3)
16/04/12 23:56:17 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x4508c270df0980316/04/12 23:56:17 INFO zookeeper.ZooKeeper: Session: 0x4508c270df09803 closed *
...
    16/04/12 23:56:21 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application.
16/04/12 23:56:21 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Timed out waiting for SparkContext.)
16/04/12 23:56:21 INFO spark.SparkContext: Invoking stop() from shutdown hook *

Answer 1

回答by user1314742

It seems that you have set the master in your code to be local

似乎您已将代码中的 master 设置为本地

SparkConf.setMaster("local[*]")

You have to let the master unset in the code, and set it later when you issue spark-submit

你必须让master在代码中取消设置，然后在你发出的时候设置它 spark-submit

spark-submit --master yarn-client ...

Answer 2

回答by Jhon Mario Lotero

If it helps someone

如果它帮助某人

Another possibility of this error is when you put incorrectly the --classparam

此错误的另一种可能性是当您错误地放置--class参数时

Answer 3

回答by Sahas

I had exactly the same problem but the above answer didn't work. Alternatively, when I ran this with spark-submit --deploy-mode clienteverything worked fine.

我遇到了完全相同的问题，但上述答案不起作用。或者，当我运行它时spark-submit --deploy-mode client一切正常。

Answer 4

回答by Scrotch

I got this same error running a SparkSQL job in cluster mode. None of the other solutions worked for me but looking in the job history server logs in Hadoop I found this stack trace.

我在集群模式下运行 SparkSQL 作业时遇到了同样的错误。其他解决方案都不适合我，但查看 Hadoop 中的作业历史服务器日志，我发现了此堆栈跟踪。

20/02/05 23:01:24 INFO hive.metastore: Connected to metastore.
20/02/05 23:03:03 ERROR yarn.ApplicationMaster: Uncaught exception: 
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
    at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run.apply$mcV$sp(ApplicationMaster.scala:245)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run.apply(ApplicationMaster.scala:245)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run.apply(ApplicationMaster.scala:245)
...

and looking at the Spark source codeyou'll find that basically the AM timed out waiting for the spark.driver.portproperty to be set by the Thread executing the user class.
So it could either be a transient issue or you should investigate your code for the reason for a timeout.

查看Spark 源代码，您会发现基本上 AM 超时，等待spark.driver.port执行用户类的线程设置该属性。
因此，这可能是一个暂时性问题，或者您应该调查您的代码以了解超时的原因。

scala Spark 在 Yarn 集群 exitCode=13 上运行：

提问by user_not_found

回答by user1314742

回答by Jhon Mario Lotero

回答by Sahas

回答by Scrotch

相关推荐

最近更新

标签

scala Spark 在 Yarn 集群 exitCode=13 上运行：

提问by user_not_found

回答by user1314742

回答by Jhon Mario Lotero

回答by Sahas

回答by Scrotch

相关推荐

scala Spark - 按键分组，然后按值计数

scala 如何检测 Spark DataFrame 是否有列

scala 如何将 spark DataFrame 转换为 RDD mllib LabeledPoints？

scala Spark 1.6：使用转义的列名删除 DataFrame 中的列

相关推荐

最近更新

标签