java 在这个 JVM 中只能运行一个 SparkContext - [SPARK]
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43890060/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Only one SparkContext may be running in this JVM - [SPARK]
提问by trick15f
I'm trying to run the following code to get twitter information live:
我正在尝试运行以下代码来实时获取 Twitter 信息:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._
import twitter4j.auth.Authorization
import twitter4j.Status
import twitter4j.auth.AuthorizationFactory
import twitter4j.conf.ConfigurationBuilder
import org.apache.spark.streaming.api.java.JavaStreamingContext
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.SparkConf
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.api.java.function.Function
import org.apache.spark.streaming.Duration
import org.apache.spark.streaming.api.java.JavaDStream
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream
val consumerKey = "xxx"
val consumerSecret = "xxx"
val accessToken = "xxx"
val accessTokenSecret = "xxx"
val url = "https://stream.twitter.com/1.1/statuses/filter.json"
val sparkConf = new SparkConf().setAppName("Twitter Streaming")
val sc = new SparkContext(sparkConf)
val documents: RDD[Seq[String]] = sc.textFile("").map(_.split(" ").toSeq)
// Twitter Streaming
val ssc = new JavaStreamingContext(sc,Seconds(2))
val conf = new ConfigurationBuilder()
conf.setOAuthAccessToken(accessToken)
conf.setOAuthAccessTokenSecret(accessTokenSecret)
conf.setOAuthConsumerKey(consumerKey)
conf.setOAuthConsumerSecret(consumerSecret)
conf.setStreamBaseURL(url)
conf.setSiteStreamBaseURL(url)
val filter = Array("Twitter", "Hadoop", "Big Data")
val auth = AuthorizationFactory.getInstance(conf.build())
val tweets : JavaReceiverInputDStream[twitter4j.Status] = TwitterUtils.createStream(ssc, auth, filter)
val statuses = tweets.dstream.map(status => status.getText)
statuses.print()
ssc.start()
But when it arrives at this command: val sc = new SparkContext(sparkConf)
, the following error appears:
但是当它到达这个命令时:val sc = new SparkContext(sparkConf)
,出现以下错误:
17/05/09 09:08:35 WARN SparkContext: Multiple running SparkContexts detected in the same JVM! org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true.
17/05/09 09:08:35 警告 SparkContext:在同一个 JVM 中检测到多个正在运行的 SparkContext!org.apache.spark.SparkException:此 JVM 中只能运行一个 SparkContext(请参阅 SPARK-2243)。要忽略此错误,请设置 spark.driver.allowMultipleContexts = true。
I have tried to add the following parameters to the sparkConf value, but the error still appears:
我曾尝试将以下参数添加到 sparkConf 值中,但仍然出现错误:
val sparkConf = new SparkConf().setAppName("Twitter Streaming").setMaster("local[4]").set("spark.driver.allowMultipleContexts", "true")
If I ignore the error and continue running commands I get this other error:
如果我忽略错误并继续运行命令,我会收到另一个错误:
17/05/09 09:15:44 WARN ReceiverSupervisorImpl: Restarting receiver with delay 2000 ms: Error receiving tweets 401:Authentication credentials (https://dev.twitter.com/pages/auth) were missing or incorrect. Ensure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync. \n\n\nError 401 Unauthorized HTTP ERROR: 401
Problem accessing '/1.1/statuses/filter.json'. Reason:Unauthorized
17/05/09 09:15:44 WARN ReceiverSupervisorImpl:延迟 2000 毫秒重新启动接收器:接收推文时出错 401:身份验证凭据 ( https://dev.twitter.com/pages/auth) 丢失或不正确。确保您设置了有效的消费者密钥/秘密、访问令牌/秘密,并且系统时钟同步。\n\n\n错误 401 未经授权的 HTTP 错误:401
访问“/1.1/statuses/filter.json”时出现问题。原因:未经授权
Any kind of contribution is appreciated. A greeting and have a good day.
任何形式的贡献表示赞赏。打个招呼,祝你有美好的一天。
回答by Rick Moritz
A Spark-shell already prepares a spark-session or spark-contextfor you to use - so you don't have to / can't initialize a new one. Usually you will have a line telling you under what variable it is available to you a the end of the spark-shell launch process. allowMultipleContexts exists only for testing some functionalities of Spark, and shouldn't be used in most cases.
Spark-shell 已经为您准备了一个 spark-session 或 spark-context供您使用 - 因此您不必/无法初始化一个新的。通常,在 spark-shell 启动过程结束时,您会看到一行告诉您在哪个变量下可用。allowMultipleContexts 仅用于测试 Spark 的某些功能,在大多数情况下不应使用。