scala EMR 上的 Spark 日志在哪里?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30494905/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:11:26  来源:igfitidea点击:

Where are the Spark logs on EMR?

scalaapache-sparkemr

提问by Sean Bollin

I'm not able to locate error logs or message's from printlncalls in Scala while running jobs on Sparkin EMR.

println在 .scala 中运行作业时Spark,我无法从Scala 中的调用中找到错误日志或消息EMR

Where can I access these?

我在哪里可以访问这些?

I'm submitting the Spark job, written in Scalato EMRusing script-runner.jarwith arguments --deploy-modeset to clusterand --masterset to yarn. It runs the job fine.

我提交Spark job,写在ScalaEMR使用script-runner.jar带有参数--deploy-mode设定cluster--master设定yarn。它运行良好。

However I do not see my printlnstatements in the Amazon EMR UIwhere it lists "stderr, stdoutetc. Furthermore if my job errors I don't see why it had an error. All I see is this in thestderr`:

但是,我没有printlnAmazon EMR UI列出“stderr、stdout stderr”的地方看到我的陈述etc. Furthermore if my job errors I don't see why it had an error. All I see is this in the

15/05/27 20:24:44 INFO yarn.Client: Application report from ResourceManager: 
 application identifier: application_1432754139536_0002
 appId: 2
 clientToAMToken: null
 appDiagnostics: 
 appMasterHost: ip-10-185-87-217.ec2.internal
 appQueue: default
 appMasterRpcPort: 0
 appStartTime: 1432758272973
 yarnAppState: FINISHED
 distributedFinalState: FAILED
 appTrackingUrl: http://10.150.67.62:9046/proxy/application_1432754139536_0002/A
 appUser: hadoop

`

`

采纳答案by ChristopherB

With the deploy mode of cluster on yarn the Spark driver and hence the user code executed will be within the Application Master container. It sounds like you had EMR debugging enabled on the cluster so logs should have also pushed to S3. In the S3 location look at task-attempts/<applicationid>/<firstcontainer>/*.

使用纱线上的集群部署模式,Spark 驱动程序和因此执行的用户代码将在 Application Master 容器中。听起来您在集群上启用了 EMR 调试,因此日志也应该推送到 S3。在 S3 位置看看task-attempts/<applicationid>/<firstcontainer>/*

回答by Anders Hammar

If you SSH into the master node of your cluster then you should be able to find the stdout, stderr, syslog and controller logs under:

如果您通过 SSH 连接到集群的主节点,那么您应该能够在以下位置找到 stdout、stderr、syslog 和控制器日志:

/mnt/var/log/hadoop/steps/<stepname>

回答by randal25

The event logs, the ones required for the spark-history-servercan be found at :

事件日志,spark-history-server可以在以下位置找到所需的日志:

hdfs:///var/log/spark/apps

回答by Holden

If you submit your job with emr-bootstrapyou can specify the log directory as an s3 bucket with --log-uri

如果您使用emr-bootstrap提交作业,则可以将日志目录指定为 s3 存储桶--log-uri