如何指定在 spark-submit 命令中使用哪个 Java 版本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36863599/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 01:57:03  来源:igfitidea点击:

How to specify which java version to use in spark-submit command?

javayarnspark-streaming

提问by Priyanka

I want to run a spark streaming application on a yarn cluster on a remote server. The default java version is 1.7 but i want to use 1.8 for my application which is also there in the server but is not the default. Is there a way to specify through spark-submit the location of java 1.8 so that i do not get major.minor error ?

我想在远程服务器上的纱线集群上运行火花流应用程序。默认的 Java 版本是 1.7,但我想将 1.8 用于我的应用程序,它也在服务器中,但不是默认版本。有没有办法通过 spark-submit 指定 java 1.8 的位置,这样我就不会出现major.minor 错误?

回答by mathieu

JAVA_HOME was not enough in our case, the driver was running in java 8, but I discovered later that Spark workers in YARN were launched using java 7 (hadoop nodes have both java version installed).

在我们的例子中 JAVA_HOME 是不够的,驱动程序在 java 8 中运行,但我后来发现 YARN 中的 Spark 工作线程是使用 java 7 启动的(hadoop 节点安装了两个 java 版本)。

I had to add spark.executorEnv.JAVA_HOME=/usr/java/<version available in workers>in spark-defaults.conf. Note that you can provide it in command line with --conf.

我不得不添加spark.executorEnv.JAVA_HOME=/usr/java/<version available in workers>spark-defaults.conf。请注意,您可以在命令行中使用--conf.

See http://spark.apache.org/docs/latest/configuration.html#runtime-environment

请参阅http://spark.apache.org/docs/latest/configuration.html#runtime-environment

回答by Radu

Although you can force the Driver code to run on a particular Java version (export JAVA_HOME=/path/to/jre/ && spark-submit ...), the workers will execute the code with the default Java version from the yarn user's PATH from the worker machine.

尽管您可以强制驱动程序代码在特定的 Java 版本 ( export JAVA_HOME=/path/to/jre/ && spark-submit ...) 上运行,但工作程序将使用来自工作程序机器的 yarn 用户路径中的默认 Java 版本执行代码。

What you can do is set each Spark instance to use a particular JAVA_HOMEby editing the spark-env.shfiles (documentation).

您可以做的是JAVA_HOME通过编辑spark-env.sh文件(文档)将每个 Spark 实例设置为使用特定的实例。

回答by Masterbuilder

If you want to set java environment for spark on yarn, you can set it before spark-submit

如果要在yarn上为spark设置java环境,可以在spark-submit之前设置

--conf spark.yarn.appMasterEnv.JAVA_HOME=/usr/java/jdk1.8.0_121 \

回答by Carlos Gomez

Add JAVA_HOME that you want in spark-env.sh (sudo find -name spark-env.sh ...ej. : /etc/spark2/conf.cloudera.spark2_on_yarn/spark-env.sh)

在 spark-env.sh 中添加你想要的 JAVA_HOME (sudo find -name spark-env.sh ...ej. : /etc/spark2/conf.cloudera.spark2_on_yarn/spark-env.sh)

回答by Avinash Ganta

The Java version would need to be set for both the Spark App Master and the Spark Executors which will be launched on YARN. Thus the spark-submit command must include two JAVA_HOME settings: spark.executorEnv.JAVA_HOMEand spark.yarn.appMasterEnv.JAVA_HOME

需要为将在 YARN 上启动的 Spark App Master 和 Spark Executors 设置 Java 版本。因此 spark-submit 命令必须包含两个 JAVA_HOME 设置:spark.executorEnv.JAVA_HOMEspark.yarn.appMasterEnv.JAVA_HOME

spark-submit --class com.example.DataFrameExample --conf "spark.executorEnv.JAVA_HOME=/jdk/jdk1.8.0_162" --conf "spark.yarn.appMasterEnv.JAVA_HOME=/jdk/jdk1.8.0_162" --master yarn --deploy-mode client /spark/programs/DataFrameExample/target/scala-2.12/dfexample_2.12-1.0.jar