scala Spark :检查您的集群 UI 以确保工作人员已注册

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35662596/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:03:00  来源:igfitidea点击:

Spark : check your cluster UI to ensure that workers are registered

scalahadoopapache-sparkclouderacloudera-manager

提问by vineet sinha

I have a simple program in Spark:

我在 Spark 中有一个简单的程序:

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("spark://10.250.7.117:7077").setAppName("Simple Application").set("spark.cores.max","2")
    val sc = new SparkContext(conf)    
    val ratingsFile = sc.textFile("hdfs://hostname:8020/user/hdfs/mydata/movieLens/ds_small/ratings.csv")

    //first get the first 10 records 
    println("Getting the first 10 records: ")
    ratingsFile.take(10)    

    //get the number of records in the movie ratings file
    println("The number of records in the movie list are : ")
    ratingsFile.count() 
  }
}

When I try to run this program from the spark-shell i.e. I log into the name node (Cloudera installation) and run the commands sequentially on the spark-shell:

当我尝试从 spark-shell 运行这个程序时,即我登录到名称节点(Cloudera 安装)并在 spark-shell 上按顺序运行命令:

val ratingsFile = sc.textFile("hdfs://hostname:8020/user/hdfs/mydata/movieLens/ds_small/ratings.csv")
println("Getting the first 10 records: ")
ratingsFile.take(10)    
println("The number of records in the movie list are : ")
ratingsFile.count() 

I get correct results, but if I try to run the program from eclipse, no resources are assigned to program and in the console log all I see is:

我得到了正确的结果,但是如果我尝试从 Eclipse 运行程序,则没有资源分配给程序,并且在控制台日志中我看到的是:

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

WARN TaskSchedulerImpl:初始作业没有接受任何资源;检查您的集群 UI 以确保工作人员已注册并拥有足够的资源

Also, in the Spark UI, I see this:

另外,在 Spark UI 中,我看到了:

Job keeps Running - Spark

作业一直在运行 - Spark

Also, it should be noted that this version of spark was installed with Cloudera (hence no worker nodes show up).

另外,应该注意的是,这个版本的 spark 是随 Cloudera 一起安装的(因此没有出现工作节点)。

What should I do to make this work?

我应该怎么做才能使这项工作?

EDIT:

编辑:

I checked the HistoryServer and these jobs don't show up there (even in incomplete applications)

我检查了 HistoryServer 并且这些作业没有显示在那里(即使在不完整的应用程序中)

采纳答案by javadba

I have done configuration and performance tuning for many spark clusters and this is a very common/normal message to see when you are first prepping/configuring a cluster to handle your workloads.

我已经为许多 Spark 集群完成了配置和性能调整,这是一个非常常见/正常的消息,可以在您第一次准备/配置集群以处理您的工作负载时看到。

This is unequivocally due to insufficient resources to have the job launched. The job is requesting one of:

这无疑是由于没有足够的资源来启动作业。该工作要求以下之一:

  • more memory per worker than allocated to it (1GB)
  • more CPU's than available on the cluster
  • 每个工人的内存比分配给它的内存多 (1GB)
  • 比集群上可用的 CPU 多

回答by vineet sinha

Finally figured out what the answer is.

终于知道答案是什么了。

When deploying a spark program on a YARN cluster, the master URL is just yarn.

在 YARN 集群上部署 Spark 程序时,主 URL 只是 yarn。

So in the program, the spark context should just looks like:

所以在程序中,spark 上下文应该是这样的:

val conf = new SparkConf().setAppName("SimpleApp")

Then this eclipse project should be built using Maven and the generated jar should be deployed on the cluster by copying it to the cluster and then running the following command

然后这个eclipse项目应该使用Maven构建并且生成的jar应该通过将其复制到集群然后运行以下命令来部署在集群上

spark-submit --master yarn --class "SimpleApp" Recommender_2-0.0.1-SNAPSHOT.jar

This means that running from eclipse directly would not work.

这意味着直接从 eclipse 运行是行不通的。

回答by iwwenbo

You can check your cluster's work node cores: your application can't exceed that. For example, you have two work node. And per work node you have 4 cores. Then you have 2 applications to run. So you can give every application 4 cores to run the job.

您可以检查集群的工作节点核心:您的应用程序不能超过该核心。例如,您有两个工作节点。每个工作节点有 4 个内核。然后你有 2 个应用程序要运行。因此,您可以为每个应用程序提供 4 个内核来运行该作业。

You can set like this in the code:

你可以在代码中这样设置:

SparkConf sparkConf = new SparkConf().setAppName("JianSheJieDuan")
                          .set("spark.cores.max", "4");

It works for me.

这个对我有用。

回答by Sarmon

There are also some causes of this same error message other than those posted here.

除了此处发布的信息之外,还有一些其他原因会导致此相同的错误消息。

For a spark-on-mesoscluster, make sure you have java8 or newerjava version on mesos slaves.

对于spark-on-mesos集群,请确保您有java8或较新的Java版本mesos slaves

For spark standalone, make sure you have java8(or newer) on the workers.

对于spark standalone,请确保您java8workers.

回答by Saket

You don't have any workers to execute the job. There are no available cores for the job to execute and that's the reason the job's state is still in 'Waiting'.

您没有任何工作人员来执行这项工作。没有可供作业执行的可用内核,这就是作业状态仍处于“等待”状态的原因。

If you have no workers registered with Cloudera how will the jobs execute?

如果您没有在 Cloudera 注册的工作人员,这些工作将如何执行?