如何在 Scala 中获取部署到 YARN 的 Spark 应用程序的 applicationId?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34588192/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get applicationId of Spark application deployed to YARN in Scala?
提问by nish1013
I'm using the following Scala code (as a custom spark-submitwrapper) to submit a Spark application to a YARN cluster:
我正在使用以下 Scala 代码(作为自定义spark-submit包装器)将 Spark 应用程序提交到 YARN 集群:
val result = Seq(spark_submit_script_here).!!
All I have at the time of submission is spark-submitand the Spark application's jar (no SparkContext). I'd like to capture applicationIdfrom result, but it's empty.
我在提交时所拥有的只是spark-submitSpark 应用程序的 jar(没有 SparkContext)。我想applicationId从捕获result,但它是空的。
I can see in my command line output the applicationId and rest of the Yarn messages:
我可以在命令行输出中看到 applicationId 和 Yarn 消息的其余部分:
INFO yarn.Client: Application report for application_1450268755662_0110
INFO yarn.Client:application_1450268755662_0110的申请报告
How can I read it within code and get the applicationId ?
如何在代码中读取它并获取 applicationId ?
采纳答案by Markon
As stated in the Spark issue 5439, you could either use SparkContext.applicationIdor parse the stderr output. Now, as you are wrapping the spark-submit command with your own script/object, I would say you need to read the stderr and get the application id.
如Spark 问题 5439 中所述,您可以使用SparkContext.applicationId或解析 stderr 输出。现在,当您使用自己的脚本/对象包装 spark-submit 命令时,我会说您需要读取 stderr 并获取应用程序 ID。
回答by Rajiv
If you are submitting the job via Python, then this is how you can get the yarn application id:
如果您通过 Python 提交作业,那么您可以通过以下方式获取纱线应用程序 ID:
cmd_list = [{
'cmd': '/usr/bin/spark-submit --name %s --master yarn --deploy-mode cluster '
'--executor-memory %s --executor-cores %s --num-executors %s '
'--class %s %s %s'
% (
app_name,
config.SJ_EXECUTOR_MEMORY,
config.SJ_EXECUTOR_CORES,
config.SJ_NUM_OF_EXECUTORS,
config.PRODUCT_SNAPSHOT_SKU_PRESTO_CLASS,
config.SPARK_JAR_LOCATION,
config.SPARK_LOGGING_ENABLED
),
'cwd': config.WORK_DIR
}]
cmd_output = subprocess.run(cmd_obj['cmd'], shell=True, check=True, cwd=cwd, stderr=subprocess.PIPE)
cmd_output = cmd_output.stderr.decode("utf-8")
yarn_application_ids = re.findall(r"application_\d{13}_\d{4}", cmd_output)
if len(yarn_application_ids):
yarn_application_id = yarn_application_ids[0]
yarn_command = "yarn logs -applicationId " + yarn_application_id
回答by mmopu
Use the spark context to get application info.
使用 spark 上下文获取应用程序信息。
sc.getConf.getAppId
res7: String = application_1532296406128_16555

