scala NoClassDefFoundError: SparkSession - 即使构建正在运行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40383880/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
NoClassDefFoundError: SparkSession - even though build is working
提问by Make42
I copied https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/RandomForestClassifierExample.scalainto a new project and setup a build.sbt
我将https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/RandomForestClassifierExample.scala复制到一个新项目中并设置了 build.sbt
name := "newproject"
version := "1.0"
scalaVersion := "2.11.8"
javacOptions ++= Seq("-source", "1.8", "-target", "1.8")
scalacOptions += "-deprecation"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-core_2.11" % "2.0.0" % "provided",
"org.apache.spark" % "spark-sql_2.11" % "2.0.0" % "provided",
"org.apache.spark" % "spark-mllib_2.11" % "2.0.0" % "provided",
"org.jpmml" % "jpmml-sparkml" % "1.1.1",
"org.apache.maven.plugins" % "maven-shade-plugin" % "2.4.3",
"org.scalatest" %% "scalatest" % "3.0.0"
)
I am able to build it from IntelliJ 2016.2.5, but I when I get the error
我可以从 IntelliJ 2016.2.5 构建它,但是当我收到错误时
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
at org.apache.spark.examples.ml.RandomForestClassifierExample$.main(RandomForestClassifierExample.scala:32)
at org.apache.spark.examples.ml.RandomForestClassifierExample.main(RandomForestClassifierExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
I am even able to click on SparkSession and get to the source code. What is the problem?
我什至可以单击 SparkSession 并访问源代码。问题是什么?
回答by Thilo
When you say providedfor your dependency, the build will compile against that dependency, but it will not be added to the classpath at runtime (it is assumed to be already there).
当您说provided依赖项时,构建将针对该依赖项进行编译,但不会在运行时将其添加到类路径中(假设它已经存在)。
That is the correct setting when building Spark jobs for spark-submit(because they will run inside of a Spark container that does provide the dependency, and including it a second time would cause trouble).
这是构建 Spark 作业时的正确设置spark-submit(因为它们将在提供依赖项的 Spark 容器内运行,第二次包含它会导致问题)。
However, when you run locally, you need that dependency present. So either change the build to not have this provided(but then you need to adjust it when building to submit the job), or configure your runtime classpath in the IDE to already have that jar file.
但是,当您在本地运行时,您需要存在该依赖项。因此,要么将构建更改为不包含此文件provided(但随后您需要在构建以提交作业时对其进行调整),要么在 IDE 中配置您的运行时类路径,使其已经拥有该 jar 文件。
回答by Garren S
In my case, I was using my local Cloudera CDH 5.9.0 cluster with Spark 1.6.1 installed by default and Spark 2.0.0 installed as a parcel. Thus, spark-submitwas using Spark 1.6.1 while spark2-submitwas Spark 2.0.0. Since SparkSession did not exist in 1.6.1, the error was thrown. Using the correct spark2-submitcommand resolved the problem.
就我而言,我使用的是本地 Cloudera CDH 5.9.0 集群,默认情况下安装了 Spark 1.6.1,并将 Spark 2.0.0 作为包裹安装。因此,spark-submit使用 Spark 1.6.1 而使用spark2-submitSpark 2.0.0。由于1.6.1中不存在SparkSession,所以抛出了这个错误。使用正确的spark2-submit命令解决了问题。
回答by Ravi
I got the same issue and it got fixed after setting SPARK_HOME variable before submitting the spark job using spark-submit.
我遇到了同样的问题,在使用 spark-submit 提交 spark 作业之前设置了 SPARK_HOME 变量后它得到了修复。
回答by sparker
Ok, I landed here following a link on sbt gitter channel searching for something else. I have a solution for this. Thilo has described the problem correctly. Your sbt says "provided" which is correct for your target environment when you run it on your cluster where spark libraries are provided but when you run locally within IntelliJ, you'll need to "provide" these external libraries to IntelliJ at runtime and a way to do that would be
好的,我是通过 sbt gitter 频道上的链接登陆这里的,目的是为了寻找其他东西。我有一个解决方案。Thilo 正确地描述了这个问题。当您在提供 Spark 库的集群上运行它时,您的 sbt 说“提供”这对您的目标环境是正确的,但是当您在 IntelliJ 中本地运行时,您需要在运行时将这些外部库“提供”给 IntelliJ,并且这样做的方法是
- Right click on your project ->
- Open Module settings ->
- Select Libraries on LHS menu ->
- Click + sign ->
- choose 'From Maven' ->
- Type or search for maven coordinates. You can search by typing the lib name and hit the tab key. This will show a dropdown of all matches and you can choose the correct version for your library ->
- Click OK
- 右键单击您的项目->
- 打开模块设置 ->
- 在 LHS 菜单上选择库 ->
- 单击 + 符号 ->
- 选择“来自 Maven”->
- 输入或搜索 maven 坐标。您可以通过键入 lib 名称并按 Tab 键进行搜索。这将显示所有匹配项的下拉列表,您可以为您的库选择正确的版本 ->
- 单击确定
Note that when you restarted IntelliJ you might have to repeat this process. I found this to be the case for IntelliJ IDEA 2016.3.6 on OS X El Capitan.
请注意,当您重新启动 IntelliJ 时,您可能需要重复此过程。我发现 OS X El Capitan 上的 IntelliJ IDEA 2016.3.6 就是这种情况。

