scala 如何在 Intellij IDEA 中运行 Spark 示例程序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21449004/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 06:02:22  来源:igfitidea点击:

How to run a spark example program in Intellij IDEA

scalaintellij-ideaapache-spark

提问by javadba

First on the command line from the root of the downloaded spark project I ran

首先在我运行的下载的 spark 项目根目录的命令行上

mvn package

It was successful.

它成功了。

Then an intellij project was created by importing the spark pom.xml.

然后通过导入spark pom.xml 创建了一个intellij 项目。

In the IDE the example class appears fine: all of the libraries are found. This can be viewed in the screenshot.

在 IDE 中,示例类看起来很好:找到了所有的库。这可以在屏幕截图中查看。

However , when attempting to run the main() a ClassNotFoundException on SparkContext occurs.

但是,当尝试运行 main() 时,会在 SparkContext 上发生 ClassNotFoundException。

Why can Intellij not simply load and run this maven based scala program? And what can be done as a workaround?

为什么 Intellij 不能简单地加载和运行这个基于 maven 的 Scala 程序?有什么办法可以解决?

As one can see below, the SparkContext is looking fine in the IDE: but then is not found when attempting to run: enter image description here

如下所示,SparkContext 在 IDE 中看起来不错:但是在尝试运行时找不到: 在此处输入图片说明

The test was run by right clicking inside main():

通过在 main() 中右键单击来运行测试:

enter image description here

在此处输入图片说明

.. and selecting Run GroupByTest

.. 并选择运行 GroupByTest

It gives

它给

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkContext
    at org.apache.spark.examples.GroupByTest$.main(GroupByTest.scala:36)
    at org.apache.spark.examples.GroupByTest.main(GroupByTest.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkContext
    at java.net.URLClassLoader.run(URLClassLoader.java:366)
    at java.net.URLClassLoader.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 7 more

Here is the run configuration:

这是运行配置:

enter image description here

在此处输入图片说明

采纳答案by Yuriy

Spark lib isn't your class_path.

Spark lib 不是您的 class_path。

Execute sbt/sbt assembly,

执行sbt/sbt assembly

and after include "/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*-deps.jar" to your project.

然后在您的项目中包含“/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*-deps.jar”。

回答by ray6080

This may help IntelliJ-Runtime-error-tt11383. Change module dependencies from provideto compile. This works for me.

这可能有助于IntelliJ-Runtime-error-tt11383。将模块依赖项从provide更改为compile。这对我有用。

回答by delr3ves

You need to add the spark dependency. If you are using maven just add these lines to your pom.xml:

您需要添加 spark 依赖项。如果您使用的是 maven,只需将这些行添加到您的 pom.xml 中:

<dependencies>
    ...
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.binary.version}</artifactId>
        <version>${spark.version}</version>
        <scope>provided</scope>
    </dependency>
    ...
</dependencies>

This way you'll have the dependency for compiling and testing purposes but not in the "jar-with-dependencies" artifact.

这样,您将拥有用于编译和测试目的的依赖项,但不会出现在“jar-with-dependencies”工件中。

But if you want to execute the whole application in an standalone cluster running in your intellij you can add a maven profile to add the dependency with compile scope. Just like this:

但是如果你想在你的 intellij 中运行的独立集群中执行整个应用程序,你可以添加一个 maven 配置文件来添加具有编译范围的依赖项。像这样:

<properties>
    <scala.binary.version>2.11</scala.binary.version>
    <spark.version>1.2.1</spark.version>
    <spark.scope>provided</spark.scope>
</properties>

<profiles>
    <profile>
        <id>local</id>
        <properties>
            <spark.scope>compile</spark.scope>
        </properties>
        <dependencies>
            <!--<dependency>-->
                <!--<groupId>org.apache.hadoop</groupId>-->
                <!--<artifactId>hadoop-common</artifactId>-->
                <!--<version>2.6.0</version>-->
            <!--</dependency>-->
            <!--<dependency>-->
                <!--<groupId>com.hadoop.gplcompression</groupId>-->
                <!--<artifactId>hadoop-gpl-compression</artifactId>-->
                <!--<version>0.1.0</version>-->
            <!--</dependency>-->
            <dependency>
                <groupId>com.hadoop.gplcompression</groupId>
                <artifactId>hadoop-lzo</artifactId>
                <version>0.4.19</version>
            </dependency>
        </dependencies>
        <activation>
            <activeByDefault>false</activeByDefault>
            <property>
                <name>env</name>
                <value>local</value>
            </property>
        </activation>
    </profile>
</profiles>

<dependencies>
    <!-- SPARK DEPENDENCIES -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.binary.version}</artifactId>
        <version>${spark.version}</version>
        <scope>${spark.scope}</scope>
    </dependency>
</dependencies>

I also added an option to my application to start a local cluster if --local is passed:

如果 --local 被传递,我还在我的应用程序中添加了一个选项来启动本地集群:

  private def sparkContext(appName: String, isLocal:Boolean): SparkContext = {
      val sparkConf = new SparkConf().setAppName(appName)
      if (isLocal) {
          sparkConf.setMaster("local")
      }
      new SparkContext(sparkConf)
  }

Finally you have to enable "local" profile in Intellij in order to get proper dependencies. Just go to "Maven Projects" tab and enable the profile.

最后,您必须在 Intellij 中启用“本地”配置文件以获得适当的依赖项。只需转到“Maven Projects”选项卡并启用配置文件。