如何在 Scala 中获取部署到 YARN 的 Spark 应用程序的 applicationId?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34588192/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:55:15  来源:igfitidea点击:

How to get applicationId of Spark application deployed to YARN in Scala?

scalaapache-sparkyarn

提问by nish1013

I'm using the following Scala code (as a custom spark-submitwrapper) to submit a Spark application to a YARN cluster:

我正在使用以下 Scala 代码(作为自定义spark-submit包装器)将 Spark 应用程序提交到 YARN 集群:

val result = Seq(spark_submit_script_here).!!

All I have at the time of submission is spark-submitand the Spark application's jar (no SparkContext). I'd like to capture applicationIdfrom result, but it's empty.

我在提交时所拥有的只是spark-submitSpark 应用程序的 jar(没有 SparkContext)。我想applicationId从捕获result,但它是空的。

I can see in my command line output the applicationId and rest of the Yarn messages:

我可以在命令行输出中看到 applicationId 和 Yarn 消息的其余部分:

INFO yarn.Client: Application report for application_1450268755662_0110

INFO yarn.Client:application_1450268755662_0110的申请报告

How can I read it within code and get the applicationId ?

如何在代码中读取它并获取 applicationId ?

采纳答案by Markon

As stated in the Spark issue 5439, you could either use SparkContext.applicationIdor parse the stderr output. Now, as you are wrapping the spark-submit command with your own script/object, I would say you need to read the stderr and get the application id.

Spark 问题 5439 中所述,您可以使用SparkContext.applicationId或解析 stderr 输出。现在,当您使用自己的脚本/对象包装 spark-submit 命令时,我会说您需要读取 stderr 并获取应用程序 ID。

回答by Rajiv

If you are submitting the job via Python, then this is how you can get the yarn application id:

如果您通过 Python 提交作业,那么您可以通过以下方式获取纱线应用程序 ID:

    cmd_list = [{
            'cmd': '/usr/bin/spark-submit --name %s --master yarn --deploy-mode cluster '
                   '--executor-memory %s --executor-cores %s --num-executors %s '
                   '--class %s %s %s'
                   % (
                       app_name,
                       config.SJ_EXECUTOR_MEMORY,
                       config.SJ_EXECUTOR_CORES,
                       config.SJ_NUM_OF_EXECUTORS,
                       config.PRODUCT_SNAPSHOT_SKU_PRESTO_CLASS,
                       config.SPARK_JAR_LOCATION,
                       config.SPARK_LOGGING_ENABLED
                   ),
            'cwd': config.WORK_DIR
        }]
cmd_output = subprocess.run(cmd_obj['cmd'], shell=True, check=True, cwd=cwd, stderr=subprocess.PIPE)
cmd_output = cmd_output.stderr.decode("utf-8")
yarn_application_ids = re.findall(r"application_\d{13}_\d{4}", cmd_output)
                if len(yarn_application_ids):
                    yarn_application_id = yarn_application_ids[0]
                    yarn_command = "yarn logs -applicationId " + yarn_application_id

回答by mmopu

Use the spark context to get application info.

使用 spark 上下文获取应用程序信息。

sc.getConf.getAppId 
res7: String = application_1532296406128_16555