scala Spark 提交 ClassNotFound 异常

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25688349/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 06:32:45  来源:igfitidea点击:

Spark-submit ClassNotFound exception

scalajarclasspathapache-spark

提问by puppet

I'm having problems with a "ClassNotFound" Exception using this simple example:

使用这个简单的例子,我遇到了“ClassNotFound”异常的问题:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

import java.net.URLClassLoader

import scala.util.Marshal

class ClassToRoundTrip(val id: Int) extends scala.Serializable {
}

object RoundTripTester {

  def test(id : Int) : ClassToRoundTrip = {

    // Get the current classpath and output. Can we see simpleapp jar?
    val cl = ClassLoader.getSystemClassLoader
    val urls = cl.asInstanceOf[URLClassLoader].getURLs
    urls.foreach(url => println("Executor classpath is:" + url.getFile))

    // Simply instantiating an instance of object and using it works fine.
    val testObj = new ClassToRoundTrip(id)
    println("testObj.id: " + testObj.id)

    val testObjBytes = Marshal.dump(testObj)
    val testObjRoundTrip = Marshal.load[ClassToRoundTrip](testObjBytes)  // <<-- ClassNotFoundException here
    testObjRoundTrip
  }
}

object SimpleApp {
  def main(args: Array[String]) {

    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)

    val cl = ClassLoader.getSystemClassLoader
    val urls = cl.asInstanceOf[URLClassLoader].getURLs
    urls.foreach(url => println("Driver classpath is: " + url.getFile))

    val data = Array(1, 2, 3, 4, 5)
    val distData = sc.parallelize(data)
    distData.foreach(x=> RoundTripTester.test(x))
  }
}

In local mode, submitting as per the docs generates a "ClassNotFound" exception on line 31, where the ClassToRoundTrip object is deserialized. Strangely, the earlier use on line 28 is okay:

在本地模式下,按照文档提交会在第 31 行生成“ClassNotFound”异常,其中 ClassToRoundTrip 对象被反序列化。奇怪的是,之前在第 28 行的使用是可以的:

spark-submit --class "SimpleApp" \
             --master local[4] \
             target/scala-2.10/simpleapp_2.10-1.0.jar

However, if I add extra parameters for "driver-class-path", and "-jars", it works fine, on local.

但是,如果我为“驱动程序类路径”和“-jars”添加额外的参数,它在本地运行良好。

spark-submit --class "SimpleApp" \
             --master local[4] \
             --driver-class-path /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
             --jars /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/SimpleApp.jar \
             target/scala-2.10/simpleapp_2.10-1.0.jar

However, submitting to a local dev master, still generates the same issue:

但是,提交给本地开发大师,仍然会产生同样的问题:

spark-submit --class "SimpleApp" \
             --master spark://localhost.localdomain:7077 \
             --driver-class-path /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
             --jars /home/xxxxxxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar \
             target/scala-2.10/simpleapp_2.10-1.0.jar

I can see from the output that the JAR file is being fetched by the executor.

我可以从输出中看到执行程序正在获取 JAR 文件。

Logs for one of the executor's are here:

执行者之一的日志在这里:

stdout: http://pastebin.com/raw.php?i=DQvvGhKm

标准输出:http: //pastebin.com/raw.php?i= DQvvGhKm

stderr: http://pastebin.com/raw.php?i=MPZZVa0Q

标准错误:http: //pastebin.com/raw.php?i= MPZZVa0Q

I'm using Spark 1.0.2. The ClassToRoundTrip is included in the JAR. I would rather not have to hardcode values in SPARK_CLASSPATH or SparkContext.addJar. Can anyone help?

我正在使用 Spark 1.0.2。ClassToRoundTrip 包含在 JAR 中。我宁愿不必在 SPARK_CLASSPATH 或 SparkContext.addJar 中硬编码值。任何人都可以帮忙吗?

采纳答案by busybug91

I had this same issue. If master is local then program runs fine for most people. If they set it to (also happened to me) "spark://myurl:7077" it doesn't work. Most people get error because an anonymous class was not found during execution. It is resolved by using SparkContext.addJars ("Path to jar").

我有同样的问题。如果 master 是本地的,那么对于大多数人来说程序运行良好。如果他们将其设置为(也发生在我身上)“spark://myurl:7077”,则它不起作用。大多数人会因为在执行过程中没有找到匿名类而出错。它是通过使用 SparkContext.addJars ("Path to jar") 解决的。

Make sure you are doing the following things: -

确保您正在做以下事情:-

  • SparkContext.addJars("Path to jar created from maven [hint: mvn package]").
  • I have used SparkConf.setMaster("spark://myurl:7077") in code and have supplied same as argument while submitting job to spark via command line.
  • When you specify class in command line, make sure your are writing it's complete name with URL. eg: "packageName.ClassName"
  • Final command should look like this bin/spark-submit --class "packageName.ClassName"--master spark://myurl:7077pathToYourJar/target/yourJarFromMaven.jar
  • SparkContext.addJars("从 maven 创建的 jar 路径 [提示:mvn 包]")。
  • 我在代码中使用了 SparkConf.setMaster(" spark://myurl:7077") 并在通过命令行提交作业到 spark 时提供了相同的参数。
  • 在命令行中指定 class 时,请确保使用 URL 编写它的完整名称。例如:“packageName.ClassName”
  • 最终命令应如下所示 bin/spark-submit --class "packageName.ClassName"--master spark://myurl:7077 pathToYourJar/target/yourJarFromMaven.jar

Note:this jar pathToYourJar/target/yourJarFromMaven.jar in last point is also set in code as in first point of this answer.

注意:最后一点中的这个 jar pathToYourJar/target/yourJarFromMaven.jar 也在代码中设置,如本答案的第一点。

回答by Yifei

I also had same issue. I think --jars is not shipping the jars to executors. After I added this into SparkConf, it works fine.

我也有同样的问题。我认为 --jars 不会将罐子运送给执行者。在我将它添加到 SparkConf 后,它工作正常。

 val conf = new SparkConf().setMaster("...").setJars(Seq("/a/b/x.jar", "/c/d/y.jar"))

This web page for trouble shootingis useful too.

这个用于故障排除的网页也很有用。

回答by capotee

You should set the SPARK_CLASS_PATH in spark-env.shfile like this:

你应该像这样在spark-env.sh文件中设置 SPARK_CLASS_PATH :

SPARK_LOCAL_IP=your local ip 
SPARK_CLASSPATH=your external jars

and you should submit with spark shell like this:spark-submit --class your.runclass --master spark://yourSparkMasterHostname:7077 /your.jar

你应该像这样提交 spark shell:spark-submit --class your.runclass --master spark://yourSparkMasterHostname:7077 /your.jar

and your java code like this:

和你的java代码是这样的:

SparkConf sparkconf = new SparkConf().setAppName("sparkOnHbase");  JavaSparkContext sc = new JavaSparkContext(sparkconf);

then it will work.

那么它会起作用。

回答by bp2010

If you are using Maven and Maven Assemblyplugin to build your jar file with mvn package, ensure that the assembly plugin is configured correctly to point to your Spark app's main class.

如果您使用 Maven 和Maven 程序集插件来构建您的 jar 文件mvn package,请确保程序集插件已正确配置为指向您的 Spark 应用程序的主类。

Something like this should be added to your pom.xmlto avoid any java.lang.ClassNotFoundException's:

像这样的东西应该添加到你的,pom.xml以避免任何java.lang.ClassNotFoundException的:

           <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>2.4.1</version>
            <configuration>
                <archive>
                    <manifest>
                        <mainClass>com.my.package.SparkDriverApp</mainClass>
                    </manifest>
                </archive>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
                <skipAssembly>false</skipAssembly>
            </configuration>
            <executions>
                <execution>
                    <id>package</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>

回答by RushHour

What I figured out was if you have build your project without any warnings then you don't have to write extra code for master and other things. Although it is a good practice but you can just avoid it. Like here in my case there was no warnings in the project so I was able to run it without any extra code. Project Structure Link

我发现如果你在没有任何警告的情况下构建了你的项目,那么你就不必为 master 和其他东西编写额外的代码。虽然这是一个很好的做法,但你可以避免它。就像在我的例子中一样,项目中没有警告,所以我可以在没有任何额外代码的情况下运行它。 项目结构链接

In the case where I have some build related warnings there I have to take care of JAR paths, my URL and the master in code as well as while executing it.

如果我有一些与构建相关的警告,我必须处理 JAR 路径、我的 URL 和代码中的主文件以及执行它。

I hope it may help someone. Cheers !

我希望它可以帮助某人。干杯!