scala 在 Apache Spark 中传递参数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27403571/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Passing Arguments in Apache Spark
提问by monster
I am running this code on a local machine:
我在本地机器上运行此代码:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "/Users/username/Spark/README.md"
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
I'd like to run the program but run it on different files - it currently only runs on README.md. How do I pass the file path of another file when running Spark (or any other argument for that matter?). For example, I'd like to change contains("a")to another letter.
我想运行该程序但在不同的文件上运行它 - 它目前仅在 README.md 上运行。运行 Spark 时如何传递另一个文件的文件路径(或任何其他与此相关的参数?)。例如,我想换contains("a")一个字母。
I make the program run by:
我让程序运行:
$ YOUR_SPARK_HOME/bin/spark-submit \
--class "SimpleApp" \
--master local[4] \
target/scala-2.10/simple-project_2.10-1.0.jar
Thanks!
谢谢!
回答by suiterdev
When you set up your main in
当你设置你的主
def main(args: Array[String]) {
you are preparing your main to accept anything after the .jar line as an argument. It will make an array named 'args' for you out of them. You then access them as usual with args[n].
您正在准备您的 main 接受 .jar 行之后的任何内容作为参数。它将为您制作一个名为“args”的数组。然后您可以像往常一样使用 args[n] 访问它们。
It might be good to check your arguments for type and/or format, it usually is if anyone other than you might run this.
检查您的参数的类型和/或格式可能会很好,通常是您以外的任何人都可以运行它。
So instead of setting the
所以,而不是设置
val logFile = "String here"
set it
设置它
val logFile = args(0)
and then pass the file as the first argument. Check spark-submit docs for more on that, but, you just enter it on the next line basically.
然后将文件作为第一个参数传递。检查 spark-submit 文档以了解更多信息,但是,您基本上只需在下一行输入它。
回答by Spidey Praful
replace value of logFile variable with below
用下面的替换 logFile 变量的值
val logFile= args(0)
val 日志文件 = args(0)
And, pass the actual value in an argument while running spark-submit like below-
并且,在运行 spark-submit 时在参数中传递实际值,如下所示 -
spark-submit --class "SimpleApp" --master localtarget/scala-2.10/simpleapp_2.10-1.0.jar "/Users/username/Spark/README.md"
spark-submit --class "SimpleApp" --master localtarget/scala-2.10/simpleapp_2.10-1.0.jar "/Users/username/Spark/README.md"

