从 Eclipse 运行 Spark 应用程序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29321237/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Running Spark Application from Eclipse
提问by RagHaven
I am trying to develop a spark application on Eclipse, and then debug it by stepping through it.
我正在尝试在 Eclipse 上开发一个 Spark 应用程序,然后通过逐步调试它。
I downloaded the Spark source code and I have added some of the spark sub projects(such as spark-core) to Eclipse. Now, I am trying to develop a spark application using Eclipse. I have already installed the ScalaIDE on Eclipse. I created a simple application based on the example given in the Spark website.
我下载了 Spark 源代码,并在 Eclipse 中添加了一些 spark 子项目(例如 spark-core)。现在,我正在尝试使用 Eclipse 开发一个 Spark 应用程序。我已经在 Eclipse 上安装了 ScalaIDE。我根据 Spark 网站中给出的示例创建了一个简单的应用程序。
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
To my project, I added the spark-core
project as a dependent project(right click -> build path -> add project). Now, I am trying to build my application and run it. However, my project shows that it has errors, but I don't see any errors listed in the problems view within Eclipse, nor do I see any lines highlighted in red. So, I am not sure what the problem is. My assumption is that I need to add external jars to my project, but I am not sure what these jars would be. The error is caused by val conf = new SparkConf().setAppName("Simple Application")
and the subsequent lines. I tried removing those lines, and the error went away. I would appreciate any help and guidance, thanks!
在我的项目中,我将spark-core
项目添加为依赖项目(右键单击 -> 构建路径 -> 添加项目)。现在,我正在尝试构建我的应用程序并运行它。但是,我的项目显示它有错误,但我在 Eclipse 的问题视图中没有看到任何错误,也没有看到任何以红色突出显示的行。所以,我不确定是什么问题。我的假设是我需要将外部 jars 添加到我的项目中,但我不确定这些 jars 是什么。该错误是由val conf = new SparkConf().setAppName("Simple Application")
和 后续行引起的。我尝试删除这些行,错误消失了。我将不胜感激任何帮助和指导,谢谢!
回答by xhudik
It seems you are not using any package/library manager (e.g. sbt, maven) which should eliminate any versioning issues. It might be challenging to set correct version of java, scala, spark and all its subsequent dependencies on your own.I strongly recommend to change your your project into Maven: Convert Existing Eclipse Project to Maven Project
看来您没有使用任何应该消除任何版本控制问题的包/库管理器(例如 sbt、maven)。自行设置正确版本的 java、scala、spark 及其所有后续依赖项可能具有挑战性。我强烈建议将您的项目更改为 Maven: Convert Existing Eclipse Project to Maven Project
Personally, I have very good experiences with sbt on IntelliJ IDEA (https://confluence.jetbrains.com/display/IntelliJIDEA/Getting+Started+with+SBT) which is easy to set up and maintain.
就我个人而言,我在 IntelliJ IDEA ( https://confluence.jetbrains.com/display/IntelliJIDEA/Getting+Started+with+SBT)上使用 sbt 有很好的经验,它易于设置和维护。
回答by Marko Bonaci
I've just created a Maven archetype for Spark the other day.
It sets up a new Spark 1.3.0project in Eclipse/Idea with Scala 2.10.4.
前几天我刚刚为 Spark 创建了一个 Maven 原型。
它在 Eclipse/Idea 中使用Scala 2.10.4设置了一个新的Spark 1.3.0项目。
Just follow the instructions here.
只需按照此处的说明操作即可。
You'll just have to change the Scala version after the project is generated:
Right click on the generated project and select:Scala > Set the Scala Installation > Fixed 2.10.5.(bundled)
您只需要在项目生成后更改 Scala 版本:
右键单击生成的项目并选择:Scala > Set the Scala Installation > Fixed 2.10.5.(bundled)
The default version that comes with ScalaIDE(currently 2.11.6
) is automatically added to the project by ScalaIDEwhen it detects scala-maven-plugin
in the pom.
当ScalaIDE在pom.xml 中检测到时,ScalaIDE附带的默认版本(当前2.11.6
)会自动添加到项目中。 scala-maven-plugin
I'd appreciate a feedback, if someone knows how to set the Scala library containerversion from Maven, while it bootstraps a new project. Where does the ScalaIDElook up the Scala version, if anywhere?
如果有人知道如何从 Maven设置Scala 库容器版本,同时引导新项目,我将不胜感激。ScalaIDE在哪里查找 Scala 版本(如果有)?
BTW Just make sure you download sources (Project right-click > Maven > Download sources
) before stepping into Spark code in debugger.
顺便说一句,Project right-click > Maven > Download sources
在调试器中进入 Spark 代码之前,请确保下载源代码 ( )。
If you want to use (IMHO the very best) Eclipse goodies (References, Type hierarchy, Call hierarchy) you'll have to build Spark yourself, so that all the sources are on your build path (as Maven Scala dependencies are not processed by EclipseIDE/JDT, even though they are, of course, on the build path).
如果您想使用(恕我直言是最好的)Eclipse 好东西(参考、类型层次结构、调用层次结构),您必须自己构建 Spark,以便所有源都在您的构建路径上(因为 Maven Scala 依赖项不被处理EclipseIDE/JDT,即使它们当然在构建路径上)。
Have fun debugging, I can tell you that it helped me tremendously to get deeper into Spark and really understand how it works :)
祝调试愉快,我可以告诉你,它极大地帮助了我深入了解 Spark 并真正了解它的工作原理:)
回答by Iulian Dragos
You could try to add the spark-assembly.jar
instead.
您可以尝试添加spark-assembly.jar
代替。
As other have noted, the better way is to use Sbt (or Maven) to manage your dependencies. spark-core
has manydependencies itself, and adding just that one jar won't be enough.
正如其他人所指出的,更好的方法是使用 Sbt(或 Maven)来管理您的依赖项。本身spark-core
有很多依赖项,仅添加一个 jar 是不够的。
回答by Prashant Bhardwaj
You haven't specified the master in you spark code. Since you're running it on your local machine. Replace following line
您尚未在火花代码中指定主人。因为你是在本地机器上运行它。替换以下行
val conf = new SparkConf().setAppName("Simple Application")
with
和
val conf = new SparkConf().setAppName("Simple Application").setMaster("local[2]")
Here "local[2]" means 2 threads will be used.
这里的“local[2]”表示将使用 2 个线程。