连接到远程 Spark master - Java / Scala

Question

提问by cybertextron

I created a 3 node (1 master, 2 workers) Apache Sparkcluster in AWS. I'm able to submit jobs to the cluster from the master, however I cannot get it work remotely.

我Apache Spark在 AWS 中创建了一个 3 节点（1 个主节点，2 个工作节点）集群。我可以从 master 向集群提交作业，但是我无法远程工作。

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "/usr/local/spark/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println(s"Lines with a: $numAs, Lines with b: $numBs")
    sc.stop()
  }
}

I can see from the master:

我可以从大师那里看到：

Spark Master at spark://ip-171-13-22-125.ec2.internal:7077
URL: spark://ip-171-13-22-125.ec2.internal:7077
REST URL: spark://ip-171-13-22-125.ec2.internal:6066 (cluster mode)

so when I execute SimpleApp.scalafrom my local machine, it fails to connect to the the Spark Master:

所以当我SimpleApp.scala从本地机器执行时，它无法连接到Spark Master：

2017-02-04 19:59:44,074 INFO  [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:54)  [] - Connecting to master spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077...
2017-02-04 19:59:44,166 WARN  [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:87)  [] - Failed to connect to spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
    at org.apache.spark.rpc.RpcTimeout$$anonfun.applyOrElse(RpcTimeout.scala:77) ~[spark-core_2.10-2.0.2.jar:2.0.2]
    at org.apache.spark.rpc.RpcTimeout$$anonfun.applyOrElse(RpcTimeout.scala:75) ~[spark-core_2.10-2.0.2.jar:2.0.2]
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) ~[scala-library-2.10.0.jar:?]
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.10-2.0.2.jar:2.0.2]

However, I know it would have worked if I had set the master to local, because then it would run locally. However, I want to have my client connecting to this remote master. How can I accomplish that? The Apache configuration looks file. I can even telnet to that public DNS and port, I also configured /etc/hostswith the public DNS and hostname for each of the EC2instances. I want to be able to submit jobs to this remote master, what am I missing?

但是，我知道如果我将 master 设置为local，它会起作用，因为它会在本地运行。但是，我想让我的客户端连接到这个远程主机。我怎样才能做到这一点？Apache 配置文件。我什至可以 telnet 到那个公共 DNS 和端口，我还/etc/hosts为每个EC2实例配置了公共 DNS 和主机名。我希望能够向这个远程主提交作业，我错过了什么？

Answer 1

回答by abaghel

For binding master host-name/IP go to your spark installation conf directory (spark-2.0.2-bin-hadoop2.7/conf) and create spark-env.sh file using below command.

要绑定主主机名/IP，请转到您的 spark 安装 conf 目录 (spark-2.0.2-bin-hadoop2.7/conf) 并使用以下命令创建 spark-env.sh 文件。

cp spark-env.sh.template spark-env.sh

Open spark-env.sh file in vi editor and add below line with host-name/IP of your master.

在 vi 编辑器中打开 spark-env.sh 文件，并在下面添加带有主机名/IP 的行。

SPARK_MASTER_HOST=ec2-54-245-111-320.compute-1.amazonaws.com

Stop and start Spark using stop-all.sh and start-all.sh. Now you can use it to connect remote master using

使用 stop-all.sh 和 start-all.sh 停止和启动 Spark。现在您可以使用它来连接远程主机

val spark = SparkSession.builder()
  .appName("SparkSample")
  .master("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077")
  .getOrCreate()

For more information on setting environment variables please check http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts

有关设置环境变量的更多信息，请查看http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts

Answer 2

回答by Andrey

I had a different problem regrading launching local code on remote cluster: The job is getting submitted, and resources are allocated properly, but the driver process on my local machine claims that cluster doesn't accepted

我在远程集群上重新启动本地代码时遇到了一个不同的问题：作业正在提交，资源已正确分配，但我本地机器上的驱动程序进程声称集群不接受

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

WARN TaskSchedulerImpl：初始作业没有接受任何资源；检查您的集群 UI 以确保工作人员已注册并拥有足够的资源

On the remote machine's logs i noticed, that it is accepting the job with driver-url from my local network

在我注意到的远程机器的日志上，它正在接受来自我本地网络的带有 driver-url 的作业

ExecutorRunner:54 - Launch command: "/opt/jdk1.8.0_131/bin/java" "-cp" "/opt/spark-2.3.3-bin-hadoop2.7/conf/:/opt/spark-2.3.3-bin-hadoop2.7/jars/*" "-Xmx16384M" "-Dspark.driver.port=59399" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://[email protected]:59399" "--executor-id" "0" "--hostname" "172.31.50.134" "--cores" "4" "--app-id" "app-20190318121936-0000" "--worker-url" "spark://[email protected]:45999"

ExecutorRunner:54 - 启动命令：“/opt/jdk1.8.0_131/bin/java”“-cp”“/opt/spark-2.3.3-bin-hadoop2.7/conf/:/opt/spark-2.3。 3-bin-hadoop2.7/jars/*""-Xmx16384M""-Dspark.driver.port=59399""org.apache.spark.executor.CoarseGrainedExecutorBackend""--driver-url""spark://CoarseGrainedScheduler @192.168.88.227:59399" "--executor-id" "0" "--hostname" "172.31.50.134" "--cores" "4" "--app-id" "app-20190318121936-0000" " --worker-url" "spark://[email protected]:45999"

So mine issues was with the wrong hostname resolving for driver process

所以我的问题是驱动程序进程的主机名解析错误

连接到远程 Spark master - Java / Scala

提问by cybertextron

回答by abaghel

回答by Andrey

相关推荐

最近更新

标签

连接到远程 Spark master - Java / Scala

提问by cybertextron

回答by abaghel

回答by Andrey

相关推荐

scala 解决 Apache Spark 中的依赖问题

Spark/Scala 在多列上使用相同的函数重复调用 withColumn()

scala 在 spark.sql 中使用 group by 选择多个元素

scala 如何在 Spark 中找到分组数据的确切中位数

相关推荐

最近更新

标签