如何使用 spark-submit(类似于 Python 脚本)运行 Scala 脚本?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44346776/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to run Scala script using spark-submit (similarly to Python script)?
提问by Roman
I try to execute a simple Scala script using Spark as described in the Spark Quick Start Tutorial. I have not troubles to execute the following Python code:
我尝试使用Spark 快速入门教程中所述的Spark执行一个简单的 Scala 脚本。我没有麻烦执行以下 Python 代码:
"""SimpleApp.py"""
from pyspark import SparkContext
logFile = "tmp.txt" # Should be some file on your system
sc = SparkContext("local", "Simple App")
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print "Lines with a: %i, lines with b: %i" % (numAs, numBs)
I execute this code using the following command:
我使用以下命令执行此代码:
/home/aaa/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit hello_world.py
However, if I try to do the same using Scala, I have technical problems. In more detail, the code that I try to execute is:
但是,如果我尝试使用 Scala 执行相同操作,则会遇到技术问题。更详细地说,我尝试执行的代码是:
* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "tmp.txt" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
I try to execute it in the following way:
我尝试通过以下方式执行它:
/home/aaa/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit hello_world.scala
As the result I get the following error message:
结果我收到以下错误消息:
Error: Cannot load main class from JAR file
Does anybody know what I am doing wrong?
有谁知道我做错了什么?
回答by eliasah
I want to add to @JacekLaskowski's an alternative solution I use sometimes for POC or tests purposes.
我想在@JacekLaskowski 中添加我有时用于 POC 或测试目的的替代解决方案。
It would be to use the script.scalafrom inside the spark-shellwith :load.
这将是使用script.scalafrom 内部spark-shellwith :load。
:load /path/to/script.scala
You won't need to define a SparkContext/SparkSessionas the script will use the variables defined in the scope of the REPL.
您不需要定义SparkContext/SparkSession因为脚本将使用在 REPL 范围内定义的变量。
You also don't need to wrap the code in a Scala object.
您也不需要将代码包装在 Scala 对象中。
PS: I consider this more as a hack and not to use for production purposes.
PS:我认为这更像是一种黑客行为,而不是用于生产目的。
回答by Jacek Laskowski
Use spark-submit --helpto know the options and arguments.
使用spark-submit --help要知道选项和参数。
$ ./bin/spark-submit --help
Usage: spark-submit [options] <app jar | python file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]
As you can see in the first Usage spark-submitrequires <app jar | python file>.
正如您在第一个 Usage spark-submitrequires 中所见<app jar | python file>。
The app jarargument is a Spark application's jar with the main object (SimpleAppin your case).
该app jar参数是一个Spark应用程序的主目标(JARSimpleApp你的情况)。
You can build the app jar using sbt or maven that you can read in the official documentation's Self-Contained Applications:
您可以使用 sbt 或 maven 构建应用程序 jar,您可以在官方文档的自包含应用程序中阅读这些内容:
Suppose we wish to write a self-contained application using the Spark API. We will walk through a simple application in Scala (with sbt), Java (with Maven), and Python.
假设我们希望使用 Spark API 编写一个自包含的应用程序。我们将在 Scala(使用 sbt)、Java(使用 Maven)和 Python 中演练一个简单的应用程序。
and later in the section:
稍后在本节中:
we can create a JAR package containing the application's code, then use the spark-submit script to run our program.
我们可以创建一个包含应用程序代码的 JAR 包,然后使用 spark-submit 脚本运行我们的程序。
p.s. Use Spark 2.1.1.
ps 使用Spark 2.1.1。

