scala 如何将 -D 参数或环境变量传递给 Spark 作业?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28166667/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to pass -D parameter or environment variable to Spark job?
提问by kopiczko
I want to change Typesafe configof a Spark job in dev/prod environment. It seems to me that the easiest way to accomplish this is to pass -Dconfig.resource=ENVNAMEto the job. Then Typesafe config library will do the job for me.
我想在开发/生产环境中更改Spark 作业的类型安全配置。在我看来,实现这一目标的最简单方法是传递-Dconfig.resource=ENVNAME给工作。然后 Typesafe 配置库将为我完成这项工作。
Is there way to pass that option directly to the job? Or maybe there is better way to change job config at runtime?
有没有办法将该选项直接传递给工作?或者也许有更好的方法在运行时更改作业配置?
EDIT:
编辑:
- Nothing happens when I add
--conf "spark.executor.extraJavaOptions=-Dconfig.resource=dev"option to spark-submitcommand. - I got
Error: Unrecognized option '-Dconfig.resource=dev'.when I pass-Dconfig.resource=devto spark-submitcommand.
- 当我
--conf "spark.executor.extraJavaOptions=-Dconfig.resource=dev"向spark-submit命令添加选项时没有任何反应。 Error: Unrecognized option '-Dconfig.resource=dev'.当我传递-Dconfig.resource=dev给spark-submit命令时,我得到了。
回答by kopiczko
Change spark-submitcommand line adding three options:
更改spark-submit命令行添加三个选项:
--files <location_to_your_app.conf>--conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app'--conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'
--files <location_to_your_app.conf>--conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app'--conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'
回答by giaosudau
Here is my spark program run with addition java option
这是我的 spark 程序运行时添加了 java 选项
/home/spark/spark-1.6.1-bin-hadoop2.6/bin/spark-submit \
--files /home/spark/jobs/fact_stats_ad.conf \
--conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf' \
--class jobs.DiskDailyJob \
--packages com.databricks:spark-csv_2.10:1.4.0 \
--jars /home/spark/jobs/alluxio-core-client-1.2.0-RC2-jar-with-dependencies.jar \
--driver-memory 2g \
/home/spark/jobs/convert_to_parquet.jar \
AD_COOKIE_REPORT FACT_AD_STATS_DAILY | tee /data/fact_ad_stats_daily.log
as you can see
the custom config file
--files /home/spark/jobs/fact_stats_ad.conf
正如您所看到的自定义配置文件
--files /home/spark/jobs/fact_stats_ad.conf
the executor java options
--conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf
执行程序 java 选项
--conf spark.executor.extraJavaOptions=-Dconfig.fuction.conf
the driver java options.
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf'
驱动程序 java 选项。
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH -Dalluxio.user.file.write.location.policy.class=alluxio.client.file.policy.MostAvailableFirstPolicy -Dconfig.file=/home/spark/jobs/fact_stats_ad.conf'
Hope it can helps.
希望它能有所帮助。
回答by Demi Ben-Ari
I Had a lot of problems with passing -D parameters to spark executors and the driver, I've added a quote from my blog post about it:
"
The right way to pass the parameter is through the property:
“spark.driver.extraJavaOptions” and “spark.executor.extraJavaOptions”:
I've passed both the log4J configurations property and the parameter that I needed for the configurations. (To the Driver I was able to pass only the log4j configuration).
For example (was written in properties file passed in spark-submit with “—properties-file”):
“
我在将 -D 参数传递给 spark 执行器和驱动程序时遇到了很多问题,我在我的博客文章中添加了关于它的引用:“传递参数的正确方法是通过属性:“ spark.driver.extraJavaOptions”和“ spark.executor.extraJavaOptions”:我已经传递了 log4J 配置属性和配置所需的参数。(对于驱动程序,我只能传递 log4j 配置)。例如(写入在 spark-submit 中传递的属性文件中,并带有“—属性文件”):“
spark.driver.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -
spark.executor.extraJavaOptions –Dlog4j.configuration=file:///spark/conf/log4j.properties -Dapplication.properties.file=hdfs:///some/path/on/hdfs/app.properties
spark.application.properties.file hdfs:///some/path/on/hdfs/app.properties
“
“
You can read my blog postabout overall configurations of spark. I'm am running on Yarn as well.
回答by linehrr
--files <location_to_your_app.conf>
--conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app'
--conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'
--files <location_to_your_app.conf>
--conf 'spark.executor.extraJavaOptions=-Dconfig.resource=app'
--conf 'spark.driver.extraJavaOptions=-Dconfig.resource=app'
if you write in this way, the later --confwill overwrite the previous one, you can verify this by looking at sparkUI after job started under Environmenttab.
如果你这样写,后面的--conf会覆盖前面的,你可以在tab下job开始后查看sparkUI来验证这一点Environment。
so the correct way is to put the options under same line like this:
--conf 'spark.executor.extraJavaOptions=-Da=b -Dc=d'
if you do this, you can find all your settings will be shown under sparkUI.
所以正确的方法是将选项放在这样的同一行下:
--conf 'spark.executor.extraJavaOptions=-Da=b -Dc=d'
如果你这样做,你会发现你的所有设置都将显示在 sparkUI 下。
回答by tgpfeiffer
I am starting my Spark application via a spark-submit command launched from within another Scala application. So I have an Array like
我正在通过从另一个 Scala 应用程序中启动的 spark-submit 命令启动我的 Spark 应用程序。所以我有一个像
Array(".../spark-submit", ..., "--conf", confValues, ...)
where confValuesis:
在哪里confValues:
- for
yarn-clustermode:"spark.driver.extraJavaOptions=-Drun.mode=production -Dapp.param=..." - for
local[*]mode:"run.mode=development"
- 对于
yarn-cluster模式:"spark.driver.extraJavaOptions=-Drun.mode=production -Dapp.param=..." - 对于
local[*]模式:"run.mode=development"
It is a bit tricky to understand where (not) to escape quotes and spaces, though. You can check the Spark web interface for system property values.
但是,理解在哪里(不)转义引号和空格有点棘手。您可以检查 Spark Web 界面以获取系统属性值。
回答by Yoga Gowda
spark-submit --driver-java-options "-Denv=DEV -Dmode=local" --class co.xxx.datapipeline.jobs.EventlogAggregator target/datapipeline-jobs-1.0-SNAPSHOT.jar
The above command works for me:
上面的命令对我有用:
-Denv=DEV=> to read DEV env properties file, and-Dmode=local=> to create SparkContext in local - .setMaster("local[*]")
-Denv=DEV=> 读取 DEV env 属性文件,并-Dmode=local=> 在本地创建 SparkContext - .setMaster("local[*]")
回答by Nitesh Saxena
Use the method like in below command, may be helpful for you -
使用以下命令中的方法,可能对您有所帮助-
spark-submit --master local[2] --conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j.properties'--conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/tmp/log4j.properties'--class com.test.spark.application.TestSparkJob target/application-0.0.1-SNAPSHOT-jar-with-dependencies.jar prod
spark-submit --master local[2] --conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j.properties' --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=file: /tmp/log4j.properties' --classcom.test.spark.application.TestSparkJob 目标/application-0.0.1-SNAPSHOT-jar-with-dependencies.jarprod
I have tried and it worked for me, I would suggest also go through heading below spark post which is really helpful - https://spark.apache.org/docs/latest/running-on-yarn.html
我已经尝试过并且它对我有用,我建议也通过火花帖子下面的标题,这真的很有帮助 - https://spark.apache.org/docs/latest/running-on-yarn.html
回答by nemo
I originally had this config file:
我最初有这个配置文件:
my-app {
environment: dev
other: xxx
}
This is how I'm loading my config in my spark scala code:
这就是我在我的 spark scala 代码中加载我的配置的方式:
val config = ConfigFactory.parseFile(File<"my-app.conf">)
.withFallback(ConfigFactory.load())
.resolve
.getConfig("my-app")
With this setup, despite what the Typesafe Config documentation and all the other answers say, the system property override didn't work for me when I launched my spark job like so:
通过这种设置,尽管 Typesafe Config 文档和所有其他答案都说了些什么,但当我像这样启动我的 spark 作业时,系统属性覆盖对我不起作用:
spark-submit \
--master yarn \
--deploy-mode cluster \
--name my-app \
--driver-java-options='-XX:MaxPermSize=256M -Dmy-app.environment=prod' \
--files my-app.conf \
my-app.jar
To get it to work I had to change my config file to:
为了让它工作,我必须将我的配置文件更改为:
my-app {
environment: dev
environment: ${?env.override}
other: xxx
}
and then launch it like so:
然后像这样启动它:
spark-submit \
--master yarn \
--deploy-mode cluster \
--name my-app \
--driver-java-options='-XX:MaxPermSize=256M -Denv.override=prod' \
--files my-app.conf \
my-app.jar

