.scala 文件的 spark-submit
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47663695/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
spark-submit for a .scala file
提问by Codejoy
I have been running some test spark scala code using probably a bad way of doing things with spark-shell:
我一直在运行一些测试 spark scala 代码,使用 spark-shell 的方法可能很糟糕:
spark-shell --conf spark.neo4j.bolt.password=Stuffffit --packages neo4j-contrib:neo4j-spark-connector:2.0.0-M2,graphframes:graphframes:0.2.0-spark2.0-s_2.11 -i neo4jsparkCluster.scala
This would execute my code on spark and pop into the shell when done.
这将在 spark 上执行我的代码并在完成后弹出 shell。
Now that I am trying to run this on a cluster, I think I need to use spark-submit, to which I thought would be:
现在我正在尝试在集群上运行它,我想我需要使用 spark-submit,我认为应该是:
spark-submit --conf spark.neo4j.bolt.password=Stuffffit --packages neo4j-contrib:neo4j-spark-connector:2.0.0-M2,graphframes:graphframes:0.2.0-spark2.0-s_2.11 -i neo4jsparkCluster.scala
but it does not like the .scala file, somehow does it have to be compiled into a class? the scala code is a simple scala file with several helper classes defined in it and no real main class so to speak. I don't see int he help files but maybe I am missing it, can I just spark-submit a file or do I have to somehow give it the class? Thus changing my scala code?
但它不喜欢 .scala 文件,不知何故它必须被编译成一个类?scala 代码是一个简单的 scala 文件,其中定义了几个帮助程序类,可以说没有真正的主类。我没有看到他的帮助文件,但也许我错过了它,我可以直接提交一个文件还是我必须以某种方式给它上课?从而改变我的Scala代码?
I did add this to my scala code too:
我也将它添加到我的 Scala 代码中:
went from this
从此
val conf = new SparkConf.setMaster("local").setAppName("neo4jspark")
val sc = new SparkContext(conf)
To this:
对此:
val sc = new SparkContext(new SparkConf().setMaster("spark://192.20.0.71:7077")
回答by shridharama
There are 2 quick and dirty ways of doing this:
有两种快速而肮脏的方法可以做到这一点:
- Without modifying the scala file
- 不修改scala文件
Simply use the spark shell with the -iflag:
只需使用带有-i标志的火花壳:
$SPARK_HOME/bin/spark-shell -i neo4jsparkCluster.scala
$SPARK_HOME/bin/spark-shell -i neo4jsparkCluster.scala
- Modifying the scala file to include a main method
- 修改 scala 文件以包含主要方法
a. Compile:
一个。编译:
scalac -classpath <location of spark jars on your machine> neo4jsparkCluster
scalac -classpath <location of spark jars on your machine> neo4jsparkCluster
b. Submit it to your cluster:
湾 将其提交到您的集群:
/usr/lib/spark/bin/spark-submit --class <qualified class name> --master <> .
/usr/lib/spark/bin/spark-submit --class <qualified class name> --master <> .
回答by zachdb86
You will want to package your scala application with sbt and include Spark as a dependency within your build.sbt file.
您需要使用 sbt 打包您的 scala 应用程序,并将 Spark 作为依赖项包含在您的 build.sbt 文件中。
See the self contained applications section of the quickstart guide for full instructions https://spark.apache.org/docs/latest/quick-start.html
有关完整说明,请参阅快速入门指南的自包含应用程序部分https://spark.apache.org/docs/latest/quick-start.html
回答by Zouzias
You can take a look at the following Hello World example for Spark which packages your application as @zachdb86 already mentioned.
您可以查看以下 Spark 的 Hello World 示例,该示例将您的应用程序打包为已经提到的 @zachdb86。

