Java 在使用 JAR 运行 spark-submit 时,如何将程序参数传递给主函数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36024565/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 17:23:09  来源:igfitidea点击:

How do I pass program-argument to main function in running spark-submit with a JAR?

javaapache-spark

提问by Eric Na

I know this is a trivial question, but I could not find the answer on the internet.

我知道这是一个微不足道的问题,但我在互联网上找不到答案。

I am trying to run a Java class with the mainfunction with program arguments (String[] args).

我正在尝试使用main带有程序参数 ( String[] args)的函数运行 Java 类。

However, when I submit the job using spark-submitand pass program arguments as I would do with

但是,当我使用spark-submit和传递程序参数提交作业时

java -cp <some jar>.jar <Some class name> <arg1> <arg2>

it does not read the args.

它不读取args。

The command I tried running was

我尝试运行的命令是

bin/spark-submit analytics-package.jar --class full.package.name.ClassName 1234 someargument someArgument

and this gives

这给

Error: No main class set in JAR; please specify one with --class

and when I tried:

当我尝试时:

bin/spark-submit --class full.package.name.ClassName 1234 someargument someArgument analytics-package.jar 

I get

我得到

Warning: Local jar /mnt/disk1/spark/1 does not exist, skipping.
java.lang.ClassNotFoundException: com.relcy.analytics.query.QueryAnalytics
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:176)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:693)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:183)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:208)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:122)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

How can I pass these arguments? They change frequently on each run of the job, and they need to be passed as arguments.

我怎样才能传递这些论点?它们在作业的每次运行中经常更改,并且需要作为参数传递。

采纳答案by Matt Clark

Arguments passed beforethe .jar file will be arguments to the JVM, where as arguments passed afterthe jar file will be passed on to the user's program.

在 .jar 文件之前传递的参数将是 JVM 的参数,而在 jar 文件之后传递的参数将传递给用户的程序。

bin/spark-submit --class classname -Xms256m -Xmx1g something.jar someargument

Here, swill equal someargument, whereas the -Xms -Xmxis passed into the JVM.

在这里,s将等于someargument,而-Xms -Xmx被传递到 JVM。

public static void main(String[] args) {

    String s = args[0];
}

回答by Eric Na

I found the correct command from this tutorial.

我从本教程中找到了正确的命令。

The command should be of the form:

该命令应采用以下形式:

bin/spark-submit --class full.package.name.ClassName analytics-package.jar someargument someArgument

回答by Sushruth

spark-submit --class SparkWordCount --master yarn --jars <jar1.jar>,<jar2.jar>
sparkwordcount-1.0.jar /user/user01/input/alice.txt /user/user01/output

回答by rahul

The first unrecognized argument is treated as the primaryResource (jar file in our case). Checkout SparkSubmitArguments.handleUnknown

第一个无法识别的参数被视为主要资源(在我们的例子中是 jar 文件)。结帐SparkSubmitArguments.handleUnknown

All the arguments after the primaryResource as treated as arguments to the application. Checkout SparkSubmitArguments.handleExtraArgs

primaryResource 之后的所有参数都被视为应用程序的参数。结帐SparkSubmitArguments.handleExtraArgs

To better understand how the arguments are parsed, checkout SparkSubmitOptionParser.parse. The above 2 methods are called from this method

为了更好地理解如何解析参数,请查看 SparkSubmitOptionParser.parse。上面2个方法是从这个方法调用的