scala java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ 运行 TwitterPopularTags 时

Question

提问by vpv

I am a beginner in Spark streaming and Scala. For a project requirement I was trying to run TwitterPopularTags example present in github. As SBT assembly was not working for me and I was not familiar with SBT I am trying to use Maven for building. After a lot of initial hiccups, I was able to create the jar file. But while trying to execute it I am getting the following error. Can anybody help me in resolving this?

我是 Spark 流和 Scala 的初学者。对于项目要求，我试图运行 github 中存在的 TwitterPopularTags 示例。由于 SBT 程序集对我不起作用，而且我不熟悉 SBT，因此我尝试使用 Maven 进行构建。经过很多最初的小问题，我能够创建 jar 文件。但是在尝试执行它时，我收到以下错误。有人可以帮我解决这个问题吗？

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$
    at TwitterPopularTags$.main(TwitterPopularTags.scala:43)
    at TwitterPopularTags.main(TwitterPopularTags.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:331)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterUtils$
    at java.net.URLClassLoader.run(URLClassLoader.java:366)
    at java.net.URLClassLoader.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 9 more

I have added following dependencies Spark-streaming_2.10:1.1.0 Spark-core_2.10:1.1.0 Spark-streaming-twitter_2.10:1.1.0

我添加了以下依赖项 Spark-streaming_2.10:1.1.0 Spark-core_2.10:1.1.0 Spark-streaming-twitter_2.10:1.1.0

I even tried the 1.2.0 for Spark-streaming-twitter but that also was giving me the same error.

我什至为 Spark-streaming-twitter 尝试了 1.2.0，但这也给了我同样的错误。

Thanks for the help in advance.

我在这里先向您的帮助表示感谢。

Regards, vpv

问候， vpv

Answer 1

回答by vpv

Thank you for giving your suggestion. I was able to resolve this issue by using SBT assembly only. Following is the details regarding how I did this.

谢谢你提出你的建议。我只能通过使用 SBT 程序集来解决这个问题。以下是有关我如何做到这一点的详细信息。

Spark - Already present in Cloudera VM Scala - Not sure if this is present in Cloudera, if not we can install it SBT - This also needs to be installed. I did both the installs on my local machine and transferred the Jar to the VM. For installing this I used the following link

Spark - Cloudera VM Scala 中已经存在 - 不确定 Cloudera 中是否存在，如果没有，我们可以安装它 SBT - 这也需要安装。我在本地机器上进行了安装并将 Jar 传输到 VM。为了安装它，我使用了以下链接

https://gist.github.com/visenger/5496675

1) Once all these are created. We have to create the parent folder for our project. I created a folder called Twitter.

1）一旦所有这些都被创建。我们必须为我们的项目创建父文件夹。我创建了一个名为 Twitter 的文件夹。

2) Create another folder with the following structure Twitter/src/main/scala and created the .scala file in this folder with the name TwitterPopularTags.scala. This has slight changes from the code which we got from the github. I had to change the import statements

2) 使用以下结构创建另一个文件夹 Twitter/src/main/scala 并在此文件夹中创建名为 TwitterPopularTags.scala 的 .scala 文件。这与我们从 github 获得的代码略有不同。我不得不更改导入语句

import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.SparkContext._
import org.apache.spark.streaming.twitter._
import org.apache.spark.SparkConf

3) After this, create another folder under the parent folder with the following name

3）在此之后，在父文件夹下创建另一个文件夹，名称如下

Twitter/project

推特/项目

and create a file with the name assembly.sbt . This has the path for the assembly plugin. Following is the full code present in the file.

并创建一个名为 assembly.sbt 的文件。这有程序集插件的路径。以下是文件中的完整代码。

resolvers += Resolver.url("sbt-plugin-releases-scalasbt", url("http://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/"))

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")

4) Once the above two are created, create a file in the parent directory of the project (Twitter) with the name build.sbt. This is where we need to provide the name of the .Jar file we need to create and also the dependencies. Please note that even the blank lines between the codes in this file are important.

4）上述两个创建完成后，在项目（Twitter）的父目录下创建一个名为build.sbt的文件。这是我们需要提供我们需要创建的 .Jar 文件的名称以及依赖项的地方。请注意，即使是此文件中代码之间的空行也很重要。

name := "TwitterPopularTags"

version := "1.0"

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
   {
    case PathList("META-INF", xs @ _*) => MergeStrategy.discard
    case x => MergeStrategy.first
   }
}

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.1.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" % "1.2.0" 

libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3" 

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

5) Finally we have to open the terminal and go to the parent folder of the project (Twitter). From here enter the following command:

5）最后，我们必须打开终端并转到项目的父文件夹（Twitter）。从这里输入以下命令：

sbt assembly

This will download the dependencies and create the jar file we need.

这将下载依赖项并创建我们需要的 jar 文件。

6) In order to run the program we need a twitter app created under our ID and provide the auth token and other details. The detailed step on how to create this is present in following link.

6) 为了运行程序，我们需要在我们的 ID 下创建一个 twitter 应用程序，并提供身份验证令牌和其他详细信息。以下链接中提供了有关如何创建它的详细步骤。

http://ampcamp.berkeley.edu/3/exercises/realtime-processing-with-spark-streaming.html

7) Once we have all the above done we can use the spark-submit command from VM to run the job. Example command is

7) 完成上述所有操作后，我们可以使用来自 VM 的 spark-submit 命令来运行作业。示例命令是

./bin/spark-submit \
  --class TwitterPopularTags \
  --master local[4] \
  /path/to/TwitterPopilarTags.jar \
  comsumerkey consumersecret accesstoken accesssecret

8) This prints the output to the console so to monitor the output it is better to reduce the frequency by adjusting the code.

8) 这会将输出打印到控制台，因此为了监视输出，最好通过调整代码来降低频率。

Please let me know if any more details are required.

如果需要更多详细信息，请告诉我。

Thanks & Regards,

感谢和问候，

VPV

Answer 2

回答by Marek Dudek

Found an easy solution (that works with 1.5.1 for sure but maybe with earlier versions too):

找到了一个简单的解决方案（肯定适用于 1.5.1，但也可能适用于早期版本）：

Submit with --packagesparameter and Maven coordinates, like:

提交--packages参数和 Maven 坐标，如：

spark-submit --master local[*] \
    --class TwitterStreaming \
    --packages "org.apache.spark:spark-streaming-twitter_2.10:1.5.1" \
    ${PATH_TO_JAR_IN_TARGET}

Described at

描述于

http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell

Answer 3

回答by hexabunny

This error simply means that at runtime TwitterUtils class(or in scala language TwitterUtils object) is not presented, however at compile time it is presented(because otherwise you won't be able to build with maven). You should make sure that the jar file you created indeed contains that class/object. You can simply unzip that jar file to see what is actually included. Most likely your maven build file uses the dependency to build the project but does not eventually include it in your final jar.

这个错误只是意味着在运行时 TwitterUtils 类（或在 Scala 语言中 TwitterUtils 对象）没有出现，但是在编译时它会出现（因为否则你将无法使用 maven 构建）。您应该确保您创建的 jar 文件确实包含该类/对象。您可以简单地解压缩该 jar 文件以查看实际包含的内容。很可能您的 maven 构建文件使用依赖项来构建项目，但最终并未将其包含在您的最终 jar 中。

Answer 4

回答by DeepikaB

Try to do it this way...

尝试这样做...

./bin/spark-submit \
  --class TwitterPopularTags \
  --jars (external_jars like twitter4j,streaming-twitter) \
  --master local[4] \
  /path/to/TwitterPopilarTags.jar \

comsumerkey consumersecret accesstoken accesssecret

消费者密钥消费者秘密访问令牌访问秘密

Answer 5

回答by Priyanshu Singh

    **I have the same problem and I am not able to fix**


    name := "SentimentAnalyser"

    version := "0.1"

    scalaVersion := "2.11.11"



    libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.2.0"

    // https://mvnrepository.com/artifact/org.apache.spark/park-streaming-twitter_2.11

    // https://mvnrepository.com/artifact/org.apache.spark/spark-streaming_2.11
    libraryDependencies += "org.apache.spark" % "spark-streaming-twitter_2.11" % "2.0.0"
    libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.2.0"

    package com

    import org.apache.spark.SparkConf
    import org.apache.spark.streaming.{Seconds, StreamingContext}
    import org.apache.spark.streaming.twitter.TwitterUtils

    object Sentiment {

      def main(args: Array[String]): Unit = {

        if(args.length<4){
          System.out.print("Enter Consumer Key (API Key) Consumer Secret (API Secret)Access Token Access Token Secret")
          System.exit(1);
        }

        val Array(customer_key,customer_secrect,access_token,access_token_secret)=args.take(4)

        System.setProperty("twiteer4j.oauth.consumerKey",customer_key)
        System.setProperty("twiteer4j.oauth.customerSecret",customer_secrect)
        System.setProperty("twiteer4j.oauth.accessToken",access_token)
        System.setProperty("twiteer4j.oauth.accessTokenSecret",access_token_secret)

        val conf=new SparkConf().setAppName("Sentiment").setMaster("local")
        val scc=new StreamingContext(conf,Seconds(30))
        //Dstream
        val stream=TwitterUtils.createStream(scc,None)

        val hashTag=stream.flatMap(status=>{status.getText.split(" ").filter(_.startsWith("#"))})

        val topHashTag60=hashTag.map((_,1)).reduceByKeyAndWindow(_+_,Seconds(60))
          .map{case (topic,count)=>(topic,count)}.transform(_.sortByKey(false))

        val topHashTag10=hashTag.map((_,1)).reduceByKeyAndWindow(_+_,Seconds(10))
          .map{case (topic,count)=>(topic,count)}.transform(_.sortByKey(false))


        topHashTag60.foreachRDD(rdd=>{
          val topList=rdd.take(10)
          println("Popular topic in last 60 sec (%s total)".format(rdd.count()))
          topList.foreach{case (count,tag)=>println("%s (%s tweets)".format(tag,count))}
        })

        topHashTag10.foreachRDD(rdd=>{
          val topList=rdd.take(10)
          println("Popular topic in last 10 sec (%s total)".format(rdd.count()))
          topList.foreach{case (count,tag)=>println("%s (%s tweets)".format(tag,count))}
        })


        scc.start()
        scc.awaitTermination()








      }

    }






I build jar using artifact in IJ ..


spark-submit --class com.Sentiment  /root/Desktop/SentimentAnalyser.jar XX XX XX XX


ERROR:
17/10/29 01:22:24 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.46.132, 34179, None)
17/10/29 01:22:27 WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data, otherwise Spark jobs will not get resources to process the received data.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$
        at com.Sentiment$.main(Sentiment.scala:26)
        at com.Sentiment.main(Sentiment.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterUtils$
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java

Answer 6

回答by JMess

To include the dependencies in your jar you will need to instruct Maven to build a "fat jar". A "fat jar" is a jar that includes the .class files not only for your project, but for all required dependencies as well (this is what sbt assembly does). The default Maven behavior is to treat your project like a library and therefore build a jar with only your .class files.

要将依赖项包含在您的 jar 中，您需要指示 Maven 构建一个“胖 jar”。“fat jar”是一个 jar，它不仅包含项目的 .class 文件，还包含所有必需的依赖项（这就是 sbt assembly 所做的）。Maven 的默认行为是将您的项目视为一个库，因此仅使用您的 .class 文件构建一个 jar。

Here is a simple maven pom which will do what you want, note that I have included some other common Spark + Maven behaviors such as using Scala, but the most relevant part is near the bottom:

这是一个简单的 maven pom，它可以做你想做的事，注意我已经包含了一些其他常见的 Spark + Maven 行为，比如使用 Scala，但最相关的部分在底部附近：

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.jmess.sparkexamples</groupId>
    <artifactId>example</artifactId>
    <version>1.0.0</version>

    <properties>
        <!-- Use java 1.8 -->
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <!-- Keep compiled against scala version uniform -->
        <scala.base.version>2.11</scala.base.version>
        <!-- Use most recent version of Scala compatible with stable Spark release -->
        <scala.version>${scala.base.version}.12</scala.version>
        <!-- Facilitates keeping multiple Spark dependencies aligned  -->
        <spark.version>2.4.0</spark.version>
    </properties>

    <dependencies>
        <!-- Begin Spark Dependencies -->
        <!-- Provides the base Spark APIs. Required for base functionality -->
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.base.version}</artifactId>
            <version>${spark.version}</version>
            <!-- In most cases this dependency is supplied by Spark -->
            <scope>provided</scope>
        </dependency>
        <!-- Provides the expanded APIs for Streaming with Kafka. Required in addition to spark-sql library -->
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql-kafka-0-10 -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_${scala.base.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <!-- End Spark Dependencies -->

        <!-- Popular scala configuration library -->
        <dependency>
            <groupId>com.typesafe</groupId>
            <artifactId>config</artifactId>
            <version>1.3.2</version>
        </dependency>
        <!-- To write to Splunk HTTP endpoint -->
    </dependencies>

    <build>
        <!-- Tells scala-maven-plugin where to look -->
        <sourceDirectory>src/main/scala</sourceDirectory>
        <testSourceDirectory>src/test/scala</testSourceDirectory>

        <plugins>
            <!-- For building scala projects using maven -->
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>4.0.1</version>
                <!-- Includes the compiled Scala .class files in some maven goals -->
                <executions>
                    <execution>
                        <goals>
                            <goal>add-source</goal>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <!-- !!!!!!! BUILD FAT JAR !!!!!!! -->
            <!-- Build a fat jar named example-1.0.0-jar-with-dependencies.jar -->
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.1.1</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id> <!-- this is used for inheritance merges -->
                        <phase>package</phase> <!-- bind to the packaging phase -->
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

Note** if you are submitting your job through spark instead of yarn, then uncomment the <scope>provided</scope>line

注意**如果您通过 spark 而不是 yarn 提交您的工作，请取消注释该<scope>provided</scope>行

scala java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ 运行 TwitterPopularTags 时

提问by vpv

回答by vpv

回答by Marek Dudek

回答by hexabunny

回答by DeepikaB

回答by Priyanshu Singh

回答by JMess

相关推荐

最近更新

标签

scala java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ 运行 TwitterPopularTags 时

提问by vpv

回答by vpv

回答by Marek Dudek

回答by hexabunny

回答by DeepikaB

回答by Priyanshu Singh

回答by JMess

相关推荐

如何从 Scala 的资源文件夹中读取文件？

scala 加特林喂食器的使用

在 Scala 中创建对象数组

使用 scala 在 Apache spark 中连接不同 RDD 的数据集

相关推荐

最近更新

标签