Java 如何使用 Hive 支持创建 SparkSession（因“找不到 Hive 类”而失败）？

Question

提问by Subhadip Majumder

I'm getting an error while trying to run the following code:

尝试运行以下代码时出现错误：

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class App {
  public static void main(String[] args) throws Exception {
    SparkSession
      .builder()
      .enableHiveSupport()
      .getOrCreate();        
  }
}

Output:

输出：

Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
    at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778)
    at com.training.hivetest.App.main(App.java:21)

How can it be resolved?

如何解决？

Answer 1

采纳答案by abaghel

Add following dependency to your maven project.

将以下依赖项添加到您的 Maven 项目中。

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.0.0</version>
</dependency>

Answer 2

回答by xuchuanyin

I've looked into the source code, and found that despite HiveSessionState(in spark-hive), another class HiveConfis also needed to initiate SparkSession. And HiveConfis not contained in spark-hive*jar， maybe you can find it in hive related jars and put it in your classpath.

我查看了源代码，发现尽管有HiveSessionState（在 spark-hive 中），但还需要另一个类HiveConf来启动 SparkSession。而HiveConf并没有包含在 spark-hive*jar 中，也许你可以在 hive 相关的 jars 中找到它并把它放在你的类路径中。

Answer 3

回答by Sruthi Poddutur

I had the same problem. I could resolve it by adding following dependencies. (I resolved this list by referring compile dependencies section of spark-hive_2.11 mvn repository page):

我有同样的问题。我可以通过添加以下依赖项来解决它。（我通过参考spark-hive_2.11 mvn 存储库页面的编译依赖项部分解决了这个列表）：

 <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.calcite</groupId>
            <artifactId>calcite-avatica</artifactId>
            <version>1.6.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.calcite</groupId>
            <artifactId>calcite-core</artifactId>
            <version>1.12.0</version>
        </dependency>
        <dependency>
            <groupId>org.spark-project.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>1.2.1.spark2</version>
        </dependency>
        <dependency>
            <groupId>org.spark-project.hive</groupId>
            <artifactId>hive-metastore</artifactId>
            <version>1.2.1.spark2</version>
        </dependency>
        <dependency>
            <groupId>org.codehaus.Hymanson</groupId>
            <artifactId>Hymanson-mapper-asl</artifactId>
            <version>1.9.13</version>
        </dependency>

where scala.binary.version = 2.11 and spark.version = 2.1.0

其中 scala.binary.version = 2.11 和 spark.version = 2.1.0

 <properties>
      <scala.binary.version>2.11</scala.binary.version>
      <spark.version>2.1.0</spark.version>
    </properties>

Answer 4

回答by Harry Nguyen

My full list of dependencies for Spark 2.4.1 is here

我的 Spark 2.4.1 依赖项的完整列表在这里

  <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.12</artifactId>
      <version>2.4.1</version>
  </dependency>

  <dependency>
      <groupId>org.apache.calcite</groupId>
      <artifactId>calcite-avatica</artifactId>
      <version>1.6.0</version>
  </dependency>
  <dependency>
      <groupId>org.apache.calcite</groupId>
      <artifactId>calcite-core</artifactId>
      <version>1.12.0</version>
  </dependency>
  <dependency>
      <groupId>org.spark-project.hive</groupId>
      <artifactId>hive-exec</artifactId>
      <version>1.2.1.spark2</version>
  </dependency>
  <dependency>
      <groupId>org.spark-project.hive</groupId>
      <artifactId>hive-metastore</artifactId>
      <version>1.2.1.spark2</version>
  </dependency>
  <dependency>
      <groupId>org.codehaus.Hymanson</groupId>
      <artifactId>Hymanson-mapper-asl</artifactId>
      <version>1.9.13</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/com.fasterxml.Hymanson.core/Hymanson-core -->
  <dependency>
      <groupId>com.fasterxml.Hymanson.core</groupId>
      <artifactId>Hymanson-core</artifactId>
      <version>2.6.7</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/com.fasterxml.Hymanson.core/Hymanson-databind -->
  <dependency>
      <groupId>com.fasterxml.Hymanson.core</groupId>
      <artifactId>Hymanson-databind</artifactId>
      <version>2.6.7.1</version>
  </dependency>


  <!-- https://mvnrepository.com/artifact/com.fasterxml.Hymanson.core/Hymanson-annotations -->
  <dependency>
      <groupId>com.fasterxml.Hymanson.core</groupId>
      <artifactId>Hymanson-annotations</artifactId>
      <version>2.6.7</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/org.codehaus.janino/janino -->
  <dependency>
      <groupId>org.codehaus.janino</groupId>
      <artifactId>janino</artifactId>
      <version>3.0.9</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/org.codehaus.janino/commons-compiler -->
  <dependency>
      <groupId>org.codehaus.janino</groupId>
      <artifactId>commons-compiler</artifactId>
      <version>3.0.9</version>
  </dependency>

Answer 5

回答by Kevin Lawrence

[Updating my Answer] This answer on StackOverflow is right - answer link.

[更新我的答案] StackOverflow 上的这个答案是正确答案链接。

I also faced issues building and running Spark with HiveSupport. Based on the above answer I did the following in my Spark 2.12.8 project.

我还遇到了使用 HiveSupport 构建和运行 Spark 的问题。基于上述答案，我在 Spark 2.12.8 项目中执行了以下操作。

Updated my build.sbt to the below content
Manually removed the files in .idea/libraries
Clicked 'Refresh all sbt projects' button in SBT Shell window (I am using intellij)

将我的 build.sbt 更新为以下内容
手动删除 .idea/libraries 中的文件
单击 SBT Shell 窗口中的“刷新所有 sbt 项目”按钮（我使用的是 Intellij）

I can now run the project without any issues.

我现在可以毫无问题地运行该项目。

libraryDependencies += "junit" % "junit" % "4.12" % Test
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.4.2",
  "org.apache.spark" %% "spark-sql" % "2.4.2",
  "org.apache.spark" %% "spark-hive" % "2.4.2" % "provided",
  "org.scalatest" %% "scalatest" % "3.0.3" % Test
)

Answer 6

回答by Sachin Patil

For SBTUse
// https://mvnrepository.com/artifact/org.apache.spark/spark-hive

对于SBT使用
// https://mvnrepository.com/artifact/org.apache.spark/spark-hive

libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.1.0"

We have used Spark-Core-2.1.0and Spark-SQL-2.1.0

我们使用了Spark-Core-2.1.0和Spark-SQL-2.1.0

Answer 7

回答by Deepesh Rehi

While all the top answers are correct, and still you are facing issues, then remember the error described in the question can still occur even though you have mentioned the jars in your pom.

虽然所有顶级答案都是正确的，但您仍然面临问题，但请记住，即使您在 pom.xml 中提到了 jars，问题中描述的错误仍然可能发生。

In order to resolve this issue, please make sure the version of all your dependencies should be same and as a standard practice, maintain a global variable for spark version and scala version, and substitute these values to avoid any conflict due to different versions.

为了解决这个问题，请确保所有依赖项的版本应该相同并作为标准实践，为 spark 版本和 scala 版本维护一个全局变量，并替换这些值以避免因版本不同而产生冲突。

Just for the reference:

仅供参考：

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.xxx.rehi</groupId>
    <artifactId>Maven9211</artifactId>
    <version>1.0-SNAPSHOT</version>
<properties>
    <scala.version>2.12</scala.version>
    <spark.version>2.4.4</spark.version>
</properties>


<dependencies>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>


    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>


</dependencies>
</project>

Answer 8

回答by ganesh hegde

In my case, I had to check

就我而言，我不得不检查

Include dependencies with "Provided" scope

包含具有“提供”范围的依赖项

under my Run/Debug Configurationin intellij

在我的运行/调试配置下intellij

Answer 9

回答by Jacek Laskowski

tl;drYou have to make sure that Spark SQL's spark-hivedependency and all transitive dependencies are available at runtimeon the CLASSPATH of a Spark SQL application (not build time that is simply required for compilation only).

tl;dr您必须确保 Spark SQL 的spark-hive依赖项和所有传递依赖项在运行时在 Spark SQL 应用程序的 CLASSPATH 上可用（不是仅编译所需的构建时间）。

In other words, you have to have org.apache.spark.sql.hive.HiveSessionStateBuilderand org.apache.hadoop.hive.conf.HiveConfclasses on the CLASSPATH of the Spark application (which has little to do with sbt or maven).

换句话说，您必须在 Spark 应用程序的 CLASSPATH 上拥有org.apache.spark.sql.hive.HiveSessionStateBuilder和org.apache.hadoop.hive.conf.HiveConf类（这与 sbt 或 maven 关系不大）。

The former HiveSessionStateBuilderis part of spark-hivedependency (incl. all the transitive dependencies).

前者HiveSessionStateBuilder是spark-hive依赖的一部分（包括所有的传递依赖）。

The latter HiveConfis part of hive-execdependency (that is a transitive dependency of the above spark-hivedependency).

后者HiveConf是hive-exec依赖的一部分（即上述spark-hive依赖的传递依赖）。

Answer 10

回答by malanb5

Ensure that you are running your jar via the spark-submit script:

确保您通过 spark-submit 脚本运行您的 jar：

${SPARK_HOME}/bin/spark-submit <settings> <your-jar-name>

which is a script that loads in the required classes and provides scala support before executing your jar.

这是一个脚本，它加载所需的类并在执行 jar 之前提供 Scala 支持。

Also as others have mentioned please ensure that you have loaded in the required dependency as well.

另外正如其他人提到的，请确保您也加载了所需的依赖项。

Example: Running a Spark Session

示例：运行 Spark 会话

pom.xml
---
<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-hive_2.11</artifactId>
  <version>2.4.4</version>
  <scope>compile</scope>
</dependency>

Test.java
---
SparkSession spark = SparkSession
    .builder()
    .appName("FeatureExtractor")
    .config("spark.master", "local")
    .config("spark.sql.hive.convertMetastoreParquet", false)
    .config("spark.submit.deployMode", "client")
    .config("spark.jars.packages", "org.apache.spark:spark-avro_2.11:2.4.4")
    .config("spark.sql.warehouse.dir", "/user/hive/warehouse")
    .config("hive.metastore.uris", "thrift://hivemetastore:9083")
    .enableHiveSupport()
    .getOrCreate();

So then to execute this code via Spark:

然后通过 Spark 执行此代码：

bin/spark-submit \
--class com.TestExample \
--executor-memory 1G \
--total-executor-cores 2 \
test.jar

Thank you to @lamber-ken who helped me with this issue.

感谢@lamber-ken 帮助我解决了这个问题。

For more information:

想要查询更多的信息：

Spark Documentation: Submitting Applications

Spark 文档：提交申请

Exception Unable to instantiate SparkSession with Hive support because Hive classes are not found

异常无法使用 Hive 支持实例化 SparkSession，因为找不到 Hive 类

Java 如何使用 Hive 支持创建 SparkSession（因“找不到 Hive 类”而失败）？

提问by Subhadip Majumder

采纳答案by abaghel

回答by xuchuanyin

回答by Sruthi Poddutur

回答by Harry Nguyen

回答by Kevin Lawrence

回答by Sachin Patil

回答by Deepesh Rehi

回答by ganesh hegde

回答by Jacek Laskowski

回答by malanb5

相关推荐

最近更新

标签

Java 如何使用 Hive 支持创建 SparkSession（因“找不到 Hive 类”而失败）？

提问by Subhadip Majumder

采纳答案by abaghel

回答by xuchuanyin

回答by Sruthi Poddutur

回答by Harry Nguyen

回答by Kevin Lawrence

回答by Sachin Patil

回答by Deepesh Rehi

回答by ganesh hegde

回答by Jacek Laskowski

回答by malanb5

相关推荐

关于 Java 中私有静态嵌套类的合成访问器的 Eclipse 警告？

Java 在 Mac OS X 上安装 Tomcat

Java Hibernate - PropertyNotFoundException：找不到

Java 如何使用 Jackson 将 HashMap 转换为 JsonNode？

相关推荐

最近更新

标签