scala 如何在 Intellij IDEA 中运行 Spark 示例程序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21449004/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to run a spark example program in Intellij IDEA
提问by javadba
First on the command line from the root of the downloaded spark project I ran
首先在我运行的下载的 spark 项目根目录的命令行上
mvn package
It was successful.
它成功了。
Then an intellij project was created by importing the spark pom.xml.
然后通过导入spark pom.xml 创建了一个intellij 项目。
In the IDE the example class appears fine: all of the libraries are found. This can be viewed in the screenshot.
在 IDE 中,示例类看起来很好:找到了所有的库。这可以在屏幕截图中查看。
However , when attempting to run the main() a ClassNotFoundException on SparkContext occurs.
但是,当尝试运行 main() 时,会在 SparkContext 上发生 ClassNotFoundException。
Why can Intellij not simply load and run this maven based scala program? And what can be done as a workaround?
为什么 Intellij 不能简单地加载和运行这个基于 maven 的 Scala 程序?有什么办法可以解决?
As one can see below, the SparkContext is looking fine in the IDE: but then is not found when attempting to run:

如下所示,SparkContext 在 IDE 中看起来不错:但是在尝试运行时找不到:

The test was run by right clicking inside main():
通过在 main() 中右键单击来运行测试:


.. and selecting Run GroupByTest
.. 并选择运行 GroupByTest
It gives
它给
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkContext
at org.apache.spark.examples.GroupByTest$.main(GroupByTest.scala:36)
at org.apache.spark.examples.GroupByTest.main(GroupByTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkContext
at java.net.URLClassLoader.run(URLClassLoader.java:366)
at java.net.URLClassLoader.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
Here is the run configuration:
这是运行配置:


采纳答案by Yuriy
Spark lib isn't your class_path.
Spark lib 不是您的 class_path。
Execute sbt/sbt assembly,
执行sbt/sbt assembly,
and after include "/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*-deps.jar" to your project.
然后在您的项目中包含“/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*-deps.jar”。
回答by ray6080
This may help IntelliJ-Runtime-error-tt11383. Change module dependencies from provideto compile. This works for me.
这可能有助于IntelliJ-Runtime-error-tt11383。将模块依赖项从provide更改为compile。这对我有用。
回答by delr3ves
You need to add the spark dependency. If you are using maven just add these lines to your pom.xml:
您需要添加 spark 依赖项。如果您使用的是 maven,只需将这些行添加到您的 pom.xml 中:
<dependencies>
...
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
...
</dependencies>
This way you'll have the dependency for compiling and testing purposes but not in the "jar-with-dependencies" artifact.
这样,您将拥有用于编译和测试目的的依赖项,但不会出现在“jar-with-dependencies”工件中。
But if you want to execute the whole application in an standalone cluster running in your intellij you can add a maven profile to add the dependency with compile scope. Just like this:
但是如果你想在你的 intellij 中运行的独立集群中执行整个应用程序,你可以添加一个 maven 配置文件来添加具有编译范围的依赖项。像这样:
<properties>
<scala.binary.version>2.11</scala.binary.version>
<spark.version>1.2.1</spark.version>
<spark.scope>provided</spark.scope>
</properties>
<profiles>
<profile>
<id>local</id>
<properties>
<spark.scope>compile</spark.scope>
</properties>
<dependencies>
<!--<dependency>-->
<!--<groupId>org.apache.hadoop</groupId>-->
<!--<artifactId>hadoop-common</artifactId>-->
<!--<version>2.6.0</version>-->
<!--</dependency>-->
<!--<dependency>-->
<!--<groupId>com.hadoop.gplcompression</groupId>-->
<!--<artifactId>hadoop-gpl-compression</artifactId>-->
<!--<version>0.1.0</version>-->
<!--</dependency>-->
<dependency>
<groupId>com.hadoop.gplcompression</groupId>
<artifactId>hadoop-lzo</artifactId>
<version>0.4.19</version>
</dependency>
</dependencies>
<activation>
<activeByDefault>false</activeByDefault>
<property>
<name>env</name>
<value>local</value>
</property>
</activation>
</profile>
</profiles>
<dependencies>
<!-- SPARK DEPENDENCIES -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>${spark.scope}</scope>
</dependency>
</dependencies>
I also added an option to my application to start a local cluster if --local is passed:
如果 --local 被传递,我还在我的应用程序中添加了一个选项来启动本地集群:
private def sparkContext(appName: String, isLocal:Boolean): SparkContext = {
val sparkConf = new SparkConf().setAppName(appName)
if (isLocal) {
sparkConf.setMaster("local")
}
new SparkContext(sparkConf)
}
Finally you have to enable "local" profile in Intellij in order to get proper dependencies. Just go to "Maven Projects" tab and enable the profile.
最后,您必须在 Intellij 中启用“本地”配置文件以获得适当的依赖项。只需转到“Maven Projects”选项卡并启用配置文件。

