.scala 文件的 spark-submit

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47663695/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:30:47  来源:igfitidea点击:

spark-submit for a .scala file

scalaapache-spark

提问by Codejoy

I have been running some test spark scala code using probably a bad way of doing things with spark-shell:

我一直在运行一些测试 spark scala 代码,使用 spark-shell 的方法可能很糟糕:

spark-shell --conf spark.neo4j.bolt.password=Stuffffit --packages neo4j-contrib:neo4j-spark-connector:2.0.0-M2,graphframes:graphframes:0.2.0-spark2.0-s_2.11 -i neo4jsparkCluster.scala 

This would execute my code on spark and pop into the shell when done.

这将在 spark 上执行我的代码并在完成后弹出 shell。

Now that I am trying to run this on a cluster, I think I need to use spark-submit, to which I thought would be:

现在我正在尝试在集群上运行它,我想我需要使用 spark-submit,我认为应该是:

spark-submit --conf spark.neo4j.bolt.password=Stuffffit --packages neo4j-contrib:neo4j-spark-connector:2.0.0-M2,graphframes:graphframes:0.2.0-spark2.0-s_2.11 -i neo4jsparkCluster.scala 

but it does not like the .scala file, somehow does it have to be compiled into a class? the scala code is a simple scala file with several helper classes defined in it and no real main class so to speak. I don't see int he help files but maybe I am missing it, can I just spark-submit a file or do I have to somehow give it the class? Thus changing my scala code?

但它不喜欢 .scala 文件,不知何故它必须被编译成一个类?scala 代码是一个简单的 scala 文件,其中定义了几个帮助程序类,可以说没有真正的主类。我没有看到他的帮助文件,但也许我错过了它,我可以直接提交一个文件还是我必须以某种方式给它上课?从而改变我的Scala代码?

I did add this to my scala code too:

我也将它添加到我的 Scala 代码中:

went from this

从此

val conf = new SparkConf.setMaster("local").setAppName("neo4jspark")


val sc = new SparkContext(conf)  

To this:

对此:

val sc = new SparkContext(new SparkConf().setMaster("spark://192.20.0.71:7077")

回答by shridharama

There are 2 quick and dirty ways of doing this:

有两种快速而肮脏的方法可以做到这一点:

  1. Without modifying the scala file
  1. 不修改scala文件

Simply use the spark shell with the -iflag:

只需使用带有-i标志的火花壳:

$SPARK_HOME/bin/spark-shell -i neo4jsparkCluster.scala

$SPARK_HOME/bin/spark-shell -i neo4jsparkCluster.scala

  1. Modifying the scala file to include a main method
  1. 修改 scala 文件以包含主要方法

a. Compile:

一个。编译:

scalac -classpath <location of spark jars on your machine> neo4jsparkCluster

scalac -classpath <location of spark jars on your machine> neo4jsparkCluster

b. Submit it to your cluster:

湾 将其提交到您的集群:

/usr/lib/spark/bin/spark-submit --class <qualified class name> --master <> .

/usr/lib/spark/bin/spark-submit --class <qualified class name> --master <> .

回答by zachdb86

You will want to package your scala application with sbt and include Spark as a dependency within your build.sbt file.

您需要使用 sbt 打包您的 scala 应用程序,并将 Spark 作为依赖项包含在您的 build.sbt 文件中。

See the self contained applications section of the quickstart guide for full instructions https://spark.apache.org/docs/latest/quick-start.html

有关完整说明,请参阅快速入门指南的自包含应用程序部分https://spark.apache.org/docs/latest/quick-start.html

回答by Zouzias

You can take a look at the following Hello World example for Spark which packages your application as @zachdb86 already mentioned.

您可以查看以下 Spark 的 Hello World 示例,该示例将您的应用程序打包为已经提到的 @zachdb86。

spark-hello-world

火花你好世界