Java 如何使用 Hive 支持创建 SparkSession(因“找不到 Hive 类”而失败)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39444493/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to create SparkSession with Hive support (fails with "Hive classes are not found")?
提问by Subhadip Majumder
I'm getting an error while trying to run the following code:
尝试运行以下代码时出现错误:
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class App {
public static void main(String[] args) throws Exception {
SparkSession
.builder()
.enableHiveSupport()
.getOrCreate();
}
}
Output:
输出:
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778)
at com.training.hivetest.App.main(App.java:21)
How can it be resolved?
如何解决?
采纳答案by abaghel
Add following dependency to your maven project.
将以下依赖项添加到您的 Maven 项目中。
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.0.0</version>
</dependency>
回答by xuchuanyin
I've looked into the source code, and found that despite HiveSessionState(in spark-hive), another class HiveConfis also needed to initiate SparkSession. And HiveConfis not contained in spark-hive*jar, maybe you can find it in hive related jars and put it in your classpath.
我查看了源代码,发现尽管有HiveSessionState(在 spark-hive 中),但还需要另一个类HiveConf来启动 SparkSession。而HiveConf并没有包含在 spark-hive*jar 中,也许你可以在 hive 相关的 jars 中找到它并把它放在你的类路径中。
回答by Sruthi Poddutur
I had the same problem. I could resolve it by adding following dependencies. (I resolved this list by referring compile dependencies section of spark-hive_2.11 mvn repository page):
我有同样的问题。我可以通过添加以下依赖项来解决它。(我通过参考spark-hive_2.11 mvn 存储库页面的编译依赖项部分解决了这个列表):
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-avatica</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-core</artifactId>
<version>1.12.0</version>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1.spark2</version>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>1.2.1.spark2</version>
</dependency>
<dependency>
<groupId>org.codehaus.Hymanson</groupId>
<artifactId>Hymanson-mapper-asl</artifactId>
<version>1.9.13</version>
</dependency>
where scala.binary.version = 2.11 and spark.version = 2.1.0
其中 scala.binary.version = 2.11 和 spark.version = 2.1.0
<properties>
<scala.binary.version>2.11</scala.binary.version>
<spark.version>2.1.0</spark.version>
</properties>
回答by Harry Nguyen
My full list of dependencies for Spark 2.4.1 is here
我的 Spark 2.4.1 依赖项的完整列表在这里
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>2.4.1</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-avatica</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-core</artifactId>
<version>1.12.0</version>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1.spark2</version>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>1.2.1.spark2</version>
</dependency>
<dependency>
<groupId>org.codehaus.Hymanson</groupId>
<artifactId>Hymanson-mapper-asl</artifactId>
<version>1.9.13</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.Hymanson.core/Hymanson-core -->
<dependency>
<groupId>com.fasterxml.Hymanson.core</groupId>
<artifactId>Hymanson-core</artifactId>
<version>2.6.7</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.Hymanson.core/Hymanson-databind -->
<dependency>
<groupId>com.fasterxml.Hymanson.core</groupId>
<artifactId>Hymanson-databind</artifactId>
<version>2.6.7.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.Hymanson.core/Hymanson-annotations -->
<dependency>
<groupId>com.fasterxml.Hymanson.core</groupId>
<artifactId>Hymanson-annotations</artifactId>
<version>2.6.7</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.codehaus.janino/janino -->
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>3.0.9</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.codehaus.janino/commons-compiler -->
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>commons-compiler</artifactId>
<version>3.0.9</version>
</dependency>
回答by Kevin Lawrence
[Updating my Answer] This answer on StackOverflow is right - answer link.
[更新我的答案] StackOverflow 上的这个答案是正确答案链接。
I also faced issues building and running Spark with HiveSupport. Based on the above answer I did the following in my Spark 2.12.8 project.
我还遇到了使用 HiveSupport 构建和运行 Spark 的问题。基于上述答案,我在 Spark 2.12.8 项目中执行了以下操作。
- Updated my build.sbt to the below content
- Manually removed the files in .idea/libraries
- Clicked 'Refresh all sbt projects' button in SBT Shell window (I am using intellij)
- 将我的 build.sbt 更新为以下内容
- 手动删除 .idea/libraries 中的文件
- 单击 SBT Shell 窗口中的“刷新所有 sbt 项目”按钮(我使用的是 Intellij)
I can now run the project without any issues.
我现在可以毫无问题地运行该项目。
libraryDependencies += "junit" % "junit" % "4.12" % Test
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.2",
"org.apache.spark" %% "spark-sql" % "2.4.2",
"org.apache.spark" %% "spark-hive" % "2.4.2" % "provided",
"org.scalatest" %% "scalatest" % "3.0.3" % Test
)
回答by Sachin Patil
For SBTUse
// https://mvnrepository.com/artifact/org.apache.spark/spark-hive
对于SBT使用
// https://mvnrepository.com/artifact/org.apache.spark/spark-hive
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.0"
We have used Spark-Core-2.1.0and Spark-SQL-2.1.0
我们使用了Spark-Core-2.1.0和Spark-SQL-2.1.0
回答by Deepesh Rehi
While all the top answers are correct, and still you are facing issues, then remember the error described in the question can still occur even though you have mentioned the jars in your pom.
虽然所有顶级答案都是正确的,但您仍然面临问题,但请记住,即使您在 pom.xml 中提到了 jars,问题中描述的错误仍然可能发生。
In order to resolve this issue, please make sure the version of all your dependencies should be same and as a standard practice, maintain a global variable for spark version and scala version, and substitute these values to avoid any conflict due to different versions.
为了解决这个问题,请确保所有依赖项的版本应该相同并作为标准实践,为 spark 版本和 scala 版本维护一个全局变量,并替换这些值以避免因版本不同而产生冲突。
Just for the reference:
仅供参考:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.xxx.rehi</groupId>
<artifactId>Maven9211</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<scala.version>2.12</scala.version>
<spark.version>2.4.4</spark.version>
</properties>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
</project>
回答by ganesh hegde
In my case, I had to check
就我而言,我不得不检查
Include dependencies with "Provided" scope
包含具有“提供”范围的依赖项
under my Run/Debug Configurationin intellij
在我的运行/调试配置下intellij
回答by Jacek Laskowski
tl;drYou have to make sure that Spark SQL's spark-hive
dependency and all transitive dependencies are available at runtimeon the CLASSPATH of a Spark SQL application (not build time that is simply required for compilation only).
tl;dr您必须确保 Spark SQL 的spark-hive
依赖项和所有传递依赖项在运行时在 Spark SQL 应用程序的 CLASSPATH 上可用(不是仅编译所需的构建时间)。
In other words, you have to have org.apache.spark.sql.hive.HiveSessionStateBuilder
and org.apache.hadoop.hive.conf.HiveConf
classes on the CLASSPATH of the Spark application (which has little to do with sbt or maven).
换句话说,您必须在 Spark 应用程序的 CLASSPATH 上拥有org.apache.spark.sql.hive.HiveSessionStateBuilder
和org.apache.hadoop.hive.conf.HiveConf
类(这与 sbt 或 maven 关系不大)。
The former HiveSessionStateBuilder
is part of spark-hive
dependency (incl. all the transitive dependencies).
前者HiveSessionStateBuilder
是spark-hive
依赖的一部分(包括所有的传递依赖)。
The latter HiveConf
is part of hive-exec
dependency (that is a transitive dependency of the above spark-hive
dependency).
后者HiveConf
是hive-exec
依赖的一部分(即上述spark-hive
依赖的传递依赖)。
回答by malanb5
Ensure that you are running your jar via the spark-submit script:
确保您通过 spark-submit 脚本运行您的 jar:
${SPARK_HOME}/bin/spark-submit <settings> <your-jar-name>
which is a script that loads in the required classes and provides scala support before executing your jar.
这是一个脚本,它加载所需的类并在执行 jar 之前提供 Scala 支持。
Also as others have mentioned please ensure that you have loaded in the required dependency as well.
另外正如其他人提到的,请确保您也加载了所需的依赖项。
Example: Running a Spark Session
示例:运行 Spark 会话
pom.xml
---
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.4</version>
<scope>compile</scope>
</dependency>
Test.java
---
SparkSession spark = SparkSession
.builder()
.appName("FeatureExtractor")
.config("spark.master", "local")
.config("spark.sql.hive.convertMetastoreParquet", false)
.config("spark.submit.deployMode", "client")
.config("spark.jars.packages", "org.apache.spark:spark-avro_2.11:2.4.4")
.config("spark.sql.warehouse.dir", "/user/hive/warehouse")
.config("hive.metastore.uris", "thrift://hivemetastore:9083")
.enableHiveSupport()
.getOrCreate();
So then to execute this code via Spark:
然后通过 Spark 执行此代码:
bin/spark-submit \
--class com.TestExample \
--executor-memory 1G \
--total-executor-cores 2 \
test.jar
Thank you to @lamber-ken who helped me with this issue.
感谢@lamber-ken 帮助我解决了这个问题。
For more information:
想要查询更多的信息:
Spark Documentation: Submitting Applications
Exception Unable to instantiate SparkSession with Hive support because Hive classes are not found