scala Spark - 提交应用程序时出现错误“必须在您的配置中设置主 URL”

Question

提问by Shuai Zhang

I have an Spark app which runs with no problem in local mode,but have some problems when submitting to the Spark cluster.

我有一个 Spark 应用程序，它在本地模式下运行没有问题，但是在提交到 Spark 集群时遇到了一些问题。

The error msg are as follows:

错误信息如下：

16/06/24 15:42:06 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2, cluster-node-02): java.lang.ExceptionInInitializerError
    at GroupEvolutionES$$anonfun.apply(GroupEvolutionES.scala:579)
    at GroupEvolutionES$$anonfun.apply(GroupEvolutionES.scala:579)
    at scala.collection.Iterator$$anon.hasNext(Iterator.scala:390)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
    at org.apache.spark.rdd.RDD$$anonfun$count.apply(RDD.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$count.apply(RDD.scala:1157)
    at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:401)
    at GroupEvolutionES$.<init>(GroupEvolutionES.scala:37)
    at GroupEvolutionES$.<clinit>(GroupEvolutionES.scala)
    ... 14 more

16/06/24 15:42:06 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, cluster-node-02): java.lang.NoClassDefFoundError: Could not initialize class GroupEvolutionES$
    at GroupEvolutionES$$anonfun.apply(GroupEvolutionES.scala:579)
    at GroupEvolutionES$$anonfun.apply(GroupEvolutionES.scala:579)
    at scala.collection.Iterator$$anon.hasNext(Iterator.scala:390)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
    at org.apache.spark.rdd.RDD$$anonfun$count.apply(RDD.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$count.apply(RDD.scala:1157)
    at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

In the above code, GroupEvolutionESis the main class. The error msg says "A master URL must be set in your configuration", but I have provided the "--master" parameter to spark-submit.

在上面的代码中，GroupEvolutionES是主类。错误消息显示“必须在您的配置中设置主 URL”，但我已将“--master”参数提供给spark-submit.

Anyone who knows how to fix this problem?

有谁知道如何解决这个问题？

Spark version: 1.6.1

星火版本：1.6.1

Answer 1

采纳答案by Dazzler

Where is the sparkContext object defined, is it inside the main function?

sparkContext对象在哪里定义的，是在main函数里面吗？

I too faced the same problem, the mistake which i did was i initiated the sparkContext outside the main function and inside the class.

我也面临同样的问题，我犯的错误是我在主函数和类内部启动了 sparkContext。

When I initiated it inside the main function, it worked fine.

当我在 main 函数中启动它时，它运行良好。

Answer 2

回答by Hyman Davidson

The TLDR:

TLDR：

.config("spark.master", "local")

a list of the options for spark.master in spark 2.2.1

spark 2.2.1 中 spark.master 的选项列表

I ended up on this page after trying to run a simple Spark SQL java program in local mode. To do this, I found that I could set spark.master using:

在尝试在本地模式下运行一个简单的 Spark SQL java 程序后，我最终来到了这个页面。为此，我发现我可以使用以下方法设置 spark.master：

SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.config("spark.master", "local")
.getOrCreate();

An update to my answer:

更新我的答案：

To be clear, this is not what you should do in a production environment. In a production environment, spark.master should be specified in one of a couple other places: either in $SPARK_HOME/conf/spark-defaults.conf (this is where cloudera manager will put it), or on the command line when you submit the app. (ex spark-submit --master yarn).

需要明确的是，这不是您在生产环境中应该做的事情。在生产环境中，应该在其他几个地方之一指定 spark.master：在 $SPARK_HOME/conf/spark-defaults.conf（这是 cloudera manager 将放置它的地方），或者在提交时在命令行上应用程序。（例如 spark-submit --master yarn）。

If you specify spark.master to be 'local' in this way, spark will try to run in a single jvm, as indicated by the comments below. If you then try to specify --deploy-mode cluster, you will get an error 'Cluster deploy mode is not compatible with master "local"'. This is because setting spark.master=local means that you are NOT running in cluster mode.

如果您以这种方式将 spark.master 指定为“本地”，则 spark 将尝试在单个 jvm 中运行，如下面的注释所示。如果您随后尝试指定 --deploy-mode 集群，您将收到错误消息“集群部署模式与主“本地”不兼容”。这是因为设置 spark.master=local 意味着您不是在集群模式下运行。

Instead, for a production app, within your main function (or in functions called by your main function), you should simply use:

相反，对于生产应用程序，在您的 main 函数中（或在您的 main 函数调用的函数中），您应该简单地使用：

SparkSession
.builder()
.appName("Java Spark SQL basic example")
.getOrCreate();

This will use the configurations specified on the command line/in config files.

这将使用在命令行/配置文件中指定的配置。

Also, to be clear on this too: --master and "spark.master" are the exact same parameter, just specified in different ways. Setting spark.master in code, like in my answer above, will override attempts to set --master, and will override values in spark-defaults.conf, so don't do it in production. Its great for tests though.

另外，也要清楚这一点：--master 和 "spark.master" 是完全相同的参数，只是以不同的方式指定。在代码中设置 spark.master，就像我上面的回答一样，将覆盖设置 --master 的尝试，并将覆盖 spark-defaults.conf 中的值，所以不要在生产中这样做。不过它非常适合测试。

also, see this answer. which links to a list of the options for spark.masterand what each one actually does.

另外，请参阅此答案。它链接到spark.master 的选项列表以及每个选项的实际作用。

a list of the options for spark.master in spark 2.2.1

spark 2.2.1 中 spark.master 的选项列表

Answer 3

回答by Sachin

Worked for me after replacing

更换后对我来说有效

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME");

with

和

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[2]").set("spark.executor.memory","1g");

Found this solution on some other thread on stackoverflow.

在 stackoverflow 上的其他一些线程上找到了这个解决方案。

Answer 4

回答by Mallikarjun M

The default value of "spark.master" is spark://HOST:PORT, and the following code tries to get a session from the standalone cluster that is running at HOST:PORT, and expects the HOST:PORT value to be in the spark config file.

“spark.master”的默认值为 spark://HOST:PORT，以下代码尝试从运行在HOST:PORT的独立集群中获取会话，并期望 HOST:PORT 值在火花配置文件。

SparkSession spark = SparkSession
    .builder()
    .appName("SomeAppName")
    .getOrCreate();

"org.apache.spark.SparkException: A master URL must be set in your configuration" states that HOST:PORTis not set in the spark configuration file.

“ org.apache.spark.SparkException: A master URL must be set in your configuration”表示未在 spark 配置文件中设置HOST:PORT。

To not bother about value of "HOST:PORT", set spark.masteras local

为了不关心“HOST:PORT”的值，将spark.master设置为本地

SparkSession spark = SparkSession
    .builder()
    .appName("SomeAppName")
    .config("spark.master", "local")
    .getOrCreate();

Hereis the link for list of formats in which master URL can be passed to spark.master

这是可以将主 URL 传递给 spark.master 的格式列表的链接

Reference : Spark Tutorial- Setup Spark Ecosystem

参考：Spark 教程-设置 Spark 生态系统

Answer 5

回答by Sasikumar Murugesan

If you are running a standalone application then you have to use SparkContextinstead of SparkSession

如果您正在运行一个独立的应用程序，那么您必须使用SparkContext而不是SparkSession

val conf = new SparkConf().setAppName("Samples").setMaster("local")
val sc = new SparkContext(conf)
val textData = sc.textFile("sample.txt").cache()

Answer 6

回答by kumar sanu

just add .setMaster("local")to your code as shown below:

只需添加.setMaster("local")到您的代码中，如下所示：

val conf = new SparkConf().setAppName("Second").setMaster("local")

It worked for me ! Happy coding !

它对我有用！快乐编码！

Answer 7

回答by Sachin Tyagi

How does spark context in your application pick the value for spark master?

您的应用程序中的 spark 上下文如何选择 spark master 的值？

You either provide it explcitly withing SparkConfwhile creating SC.
Or it picks from the System.getProperties(where SparkSubmit earlier put it after reading your --masterargument).

您可以在SparkConf创建 SC 时显式地提供它。
或者它从System.getProperties（SparkSubmit 在阅读您的--master论点后早些时候将其放入的位置）中进行选择。

Now, SparkSubmitruns on the driver -- which in your case is the machine from where you're executing the spark-submitscript. And this is probably working as expected for you too.

现在，SparkSubmit在驱动程序上运行——在您的情况下，它是您执行spark-submit脚本的机器。这对您来说也可能按预期工作。

However, from the information you've posted it looks like you are creating a spark context in the code that is sent to the executor -- and given that there is no spark.mastersystem property available there, it fails. (And you shouldn't really be doing so, if this is the case.)

但是，从您发布的信息来看，您似乎是在发送给执行程序的代码中创建了一个 spark 上下文——并且鉴于那里没有spark.master可用的系统属性，它失败了。（如果是这种情况，您真的不应该这样做。）

Can you please post the GroupEvolutionEScode (specifically where you're creating SparkContext(s)).

您能否发布GroupEvolutionES代码（特别是您正在创建的位置SparkContext(s)）。

Answer 8

回答by Nazima

Replacing :

更换：

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME");
WITH
SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[2]").set("spark.executor.memory","1g");

Did the magic.

施了魔法。

Answer 9

回答by Nazima

I had the same problem, Here is my code before modification :

我遇到了同样的问题，这是我修改前的代码：

package com.asagaama

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD

/**
  * Created by asagaama on 16/02/2017.
  */
object Word {

  def countWords(sc: SparkContext) = {
    // Load our input data
    val input = sc.textFile("/Users/Documents/spark/testscase/test/test.txt")
    // Split it up into words
    val words = input.flatMap(line => line.split(" "))
    // Transform into pairs and count
    val counts = words.map(word => (word, 1)).reduceByKey { case (x, y) => x + y }
    // Save the word count back out to a text file, causing evaluation.
    counts.saveAsTextFile("/Users/Documents/spark/testscase/test/result.txt")
  }

  def main(args: Array[String]) = {
    val conf = new SparkConf().setAppName("wordCount")
    val sc = new SparkContext(conf)
    countWords(sc)
  }

}

And after replacing :

更换后：

val conf = new SparkConf().setAppName("wordCount")

With :

和：

val conf = new SparkConf().setAppName("wordCount").setMaster("local[*]")

It worked fine !

它工作得很好！

Answer 10

回答by gyuseong

try this

试试这个

make trait

使特质

import org.apache.spark.sql.SparkSession
trait SparkSessionWrapper {
   lazy val spark:SparkSession = {
      SparkSession
        .builder()
        .getOrCreate()
    }
}

extends it

扩展它

object Preprocess extends SparkSessionWrapper {

scala Spark - 提交应用程序时出现错误“必须在您的配置中设置主 URL”

提问by Shuai Zhang

采纳答案by Dazzler

回答by Hyman Davidson

回答by Sachin

回答by Mallikarjun M

回答by Sasikumar Murugesan

回答by kumar sanu

回答by Sachin Tyagi

回答by Nazima

回答by Nazima

回答by gyuseong

相关推荐

最近更新

标签

scala Spark - 提交应用程序时出现错误“必须在您的配置中设置主 URL”

提问by Shuai Zhang

采纳答案by Dazzler

回答by Hyman Davidson

回答by Sachin

回答by Mallikarjun M

回答by Sasikumar Murugesan

回答by kumar sanu

回答by Sachin Tyagi

回答by Nazima

回答by Nazima

回答by gyuseong

相关推荐

使用 Scala 将字符串转换为 Spark 的时间戳

Spark Scala：检索模式并存储它

scala 在 Spark SQL 中自动优雅地展平 DataFrame

scala 使用数据框架构的 Spark 地图数据框

相关推荐

最近更新

标签