scala 错误 SparkContext:初始化 SparkContext 时出错

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36038188/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:05:11  来源:igfitidea点击:

ERROR SparkContext: Error initializing SparkContext

scalaapache-spark

提问by G.Saleh

I am using spark-1.5.0-cdh5.6.0. tried the sample application (scala) command is:

我正在使用spark-1.5.0-cdh5.6.0。尝试的示例应用程序(scala)命令是:

> spark-submit --class com.cloudera.spark.simbox.sparksimbox.WordCount --master local /home/hadoop/work/testspark.jar

Got the following error:

得到以下错误:

 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File file:/user/spark/applicationHistory does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:424)
        at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
        at com.cloudera.spark.simbox.sparksimbox.WordCount$.main(WordCount.scala:12)
        at com.cloudera.spark.simbox.sparksimbox.WordCount.main(WordCount.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

回答by Yuval Itzchakov

Spark has a feature called "history server" which allows you to browse historical events after the SparkContextdies. This property is set via setting spark.eventLog.enabledto true.

Spark 有一个名为“历史服务器”的功能,它允许您在SparkContext死亡后浏览历史事件。此属性通过设置spark.eventLog.enabled为来设置true

You have two options, either specify a valid directory to store the event log via the spark.eventLog.dirconfig value, or simply set spark.eventLog.enabledto falseif you don't need it.

您有两个选项,要么通过spark.eventLog.dir配置值指定一个有效的目录来存储事件日志,要么在不需要时简单地设置spark.eventLog.enabledfalse

You can read more on that in the Spark Configurationpage.

您可以在Spark 配置页面中阅读更多相关内容。

回答by Nagesh Singh Chauhan

I got the same error which working with nltk in spark, To fix this I just removed all the nltk related properties from spark-conf.default.

我在 spark 中使用 nltk 时遇到了同样的错误,为了解决这个问题,我刚刚从 spark-conf.default 中删除了所有与 nltk 相关的属性。