scala EMR 上的 Spark 日志在哪里?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30494905/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Where are the Spark logs on EMR?
提问by Sean Bollin
I'm not able to locate error logs or message's from printlncalls in Scala while running jobs on Sparkin EMR.
println在 .scala 中运行作业时Spark,我无法从Scala 中的调用中找到错误日志或消息EMR。
Where can I access these?
我在哪里可以访问这些?
I'm submitting the Spark job, written in Scalato EMRusing script-runner.jarwith arguments --deploy-modeset to clusterand --masterset to yarn. It runs the job fine.
我提交Spark job,写在Scala以EMR使用script-runner.jar带有参数--deploy-mode设定cluster和--master设定yarn。它运行良好。
However I do not see my printlnstatements in the Amazon EMR UIwhere it lists "stderr, stdoutetc. Furthermore if my job errors I don't see why it had an error. All I see is this in thestderr`:
但是,我没有println在Amazon EMR UI列出“stderr、stdout stderr”的地方看到我的陈述etc. Furthermore if my job errors I don't see why it had an error. All I see is this in the:
15/05/27 20:24:44 INFO yarn.Client: Application report from ResourceManager:
application identifier: application_1432754139536_0002
appId: 2
clientToAMToken: null
appDiagnostics:
appMasterHost: ip-10-185-87-217.ec2.internal
appQueue: default
appMasterRpcPort: 0
appStartTime: 1432758272973
yarnAppState: FINISHED
distributedFinalState: FAILED
appTrackingUrl: http://10.150.67.62:9046/proxy/application_1432754139536_0002/A
appUser: hadoop
`
`
采纳答案by ChristopherB
With the deploy mode of cluster on yarn the Spark driver and hence the user code executed will be within the Application Master container. It sounds like you had EMR debugging enabled on the cluster so logs should have also pushed to S3. In the S3 location look at task-attempts/<applicationid>/<firstcontainer>/*.
使用纱线上的集群部署模式,Spark 驱动程序和因此执行的用户代码将在 Application Master 容器中。听起来您在集群上启用了 EMR 调试,因此日志也应该推送到 S3。在 S3 位置看看task-attempts/<applicationid>/<firstcontainer>/*。
回答by Anders Hammar
If you SSH into the master node of your cluster then you should be able to find the stdout, stderr, syslog and controller logs under:
如果您通过 SSH 连接到集群的主节点,那么您应该能够在以下位置找到 stdout、stderr、syslog 和控制器日志:
/mnt/var/log/hadoop/steps/<stepname>
回答by randal25
The event logs, the ones required for the spark-history-servercan be found at :
事件日志,spark-history-server可以在以下位置找到所需的日志:
hdfs:///var/log/spark/apps
回答by Holden
If you submit your job with emr-bootstrapyou can specify the log directory as an s3 bucket with --log-uri
如果您使用emr-bootstrap提交作业,则可以将日志目录指定为 s3 存储桶--log-uri

