scala setMaster`local[*]` 在火花中是什么意思?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32356143/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:32:41  来源:igfitidea点击:

What does setMaster `local[*]` mean in spark?

scalaapache-spark

提问by Freewind

I found some code to start spark locally with:

我找到了一些在本地启动 spark 的代码:

val conf = new SparkConf().setAppName("test").setMaster("local[*]")
val ctx = new SparkContext(conf)

What does the [*]mean?

这是什么[*]意思?

回答by ccheneson

From the doc:

文档

./bin/spark-shell --master local[2]

The --masteroption specifies the master URL for a distributed cluster, or localto run locally with one thread, or local[N]to run locally with N threads. You should start by using local for testing.

--master选项指定分布式集群的主 URL,或者local用一个线程local[N]在本地运行,或者用 N 个线程在本地运行。您应该首先使用本地进行测试。

And from here:

这里开始

local[*]Run Spark locally with as many worker threads as logical cores on your machine.

local[*]使用与机器上的逻辑核心一样多的工作线程在本地运行 Spark。

回答by FreeMan

Master URL Meaning

主 URL 含义



local: Run Spark locally with one worker thread (i.e. no parallelism at all).

local:使用一个工作线程在本地运行 Spark(即根本没有并行性)。



local[K]: Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine).

local[K]:使用 K 个工作线程在本地运行 Spark(理想情况下,将其设置为您机器上的内核数)。



local[K,F]: Run Spark locally with K worker threads and F maxFailures (see spark.task.maxFailures for an explanation of this variable)

local[K,F]:使用 K 个工作线程和 F maxFailures 在本地运行 Spark(有关此变量的解释,请参阅 spark.task.maxFailures)



local[*]: Run Spark locally with as many worker threads as logical cores on your machine.

local[*]:使用与机器上的逻辑核心一样多的工作线程在本地运行 Spark。



local[*,F]: Run Spark locally with as many worker threads as logical cores on your machine and F maxFailures.

local[*,F]:在本地运行 Spark,使用与机器上的逻辑内核一样多的工作线程和 F maxFailures。



spark://HOST:PORT: Connect to the given Spark standalone cluster master. The port must be whichever one your master is configured to use, which is 7077 by default.

spark://HOST:PORT:连接到给定的 Spark 独立集群主节点。端口必须是您的主机配置使用的任何一个,默认为 7077。



spark://HOST1:PORT1,HOST2:PORT2: Connect to the given Spark standalone cluster with standby masters with Zookeeper. The list must have all the master hosts in the high availability cluster set up with Zookeeper. The port must be whichever each master is configured to use, which is 7077 by default.

spark://HOST1:PORT1,HOST2:PORT2:使用 Zookeeper 连接到具有备用主节点的给定 Spark 独立集群。该列表必须具有使用 Zookeeper 设置的高可用性集群中的所有主控主机。端口必须是每个主配置使用的端口,默认为 7077。



mesos://HOST:PORT: Connect to the given Mesos cluster. The port must be whichever you have configured to use, which is 5050 by default. Or, for a Mesos cluster using ZooKeeper, use mesos://zk://.... To submit with --deploy-mode cluster, the HOST:PORT should be configured to connect to the MesosClusterDispatcher.

mesos://HOST:PORT:连接到给定的 Mesos 集群。端口必须是您配置使用的任何一个,默认为 5050。或者,对于使用 ZooKeeper 的 Mesos 集群,使用 mesos://zk://.... 要使用 --deploy-mode 集群提交,HOST:PORT 应配置为连接到 MesosClusterDispatcher。



yarn: Connect to a YARN cluster in client or cluster mode depending on the value of --deploy-mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable.

yarn:根据 --deploy-mode 的值,以客户端或集群模式连接到 YARN 集群。将根据 HADOOP_CONF_DIR 或 YARN_CONF_DIR 变量找到集群位置。

https://spark.apache.org/docs/latest/submitting-applications.html

https://spark.apache.org/docs/latest/submitting-applications.html

回答by mat77

Some additional Info

一些额外的信息

Do not run Spark Streaming programs locally with master configured as "local" or "local[ 1]". This allocates only one CPU for tasks and if a receiver is running on it, there is no resource left to process the received data. Use at least "local[ 2]" to have more cores.

不要在 master 配置为“local”或“local[1]”的情况下在本地运行 Spark Streaming 程序。这只会为任务分配一个 CPU,如果接收器正在其上运行,则没有剩余资源来处理接收到的数据。至少使用“local[2]”以获得更多内核。

From -Learning Spark: Lightning-Fast Big Data Analysis

来自 -Learning Spark:闪电般快速的大数据分析

回答by Ram Ghadiyaram

Master URL

主网址

You can run Spark in local mode using local, local[n]or the most general local[*]for the master URL.

您可以使用local, local[n]或 最通用local[*]的主 URL在本地模式下运行 Spark 。

The URL says how many threads can be used in total:

URL 表示总共可以使用多少个线程:

localuses 1 thread only.

local仅使用 1 个线程。

local[n]uses n threads.

local[n]使用 n 个线程。

local[*]uses as many threads as the number of processors available to the Java virtual machine (it uses Runtime.getRuntime.availableProcessors()to know the number).

local[*]使用与 Java 虚拟机可用的处理器数量一样多的线程(它Runtime.getRuntime.availableProcessors()用来知道数量)。

local[N, maxFailures](called local-with-retries) with Nbeing *or the number of threads to use (as explained above) and maxFailures being the value of spark.task.maxFailures.

local[N, maxFailures](称为 local-with-retries),N*或要使用的线程数(如上所述),maxFailures 是spark.task.maxFailures.

回答by Devbrat Shukla

You can run Spark in local mode using local, local[n] or the most general local[*] for the master URL.

您可以使用 local、local[n] 或最通用的 local[*] 作为主 URL 在本地模式下运行 Spark。

The URL says how many threads can be used in total:-

URL 表示总共可以使用多少个线程:-

local uses 1 thread only.

本地仅使用 1 个线程。

local[n] uses n threads.

local[n] 使用 n 个线程。

local[*] uses as many threads as your spark local machine have, where you are running your application.

local[*] 使用的线程数与您运行应用程序的 spark 本地计算机所拥有的线程数一样多。

you can check by lscpu in your Linux machine

你可以在你的 Linux 机器上通过 lscpu 检查

[ie@mapr2 ~]$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 56On-line CPU(s) list: 0-55 Thread(s) per core: 2

[ie@mapr2 ~]$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 56Online CPU(s) list: 0-55 Thread (s) 每个核心:2

if your machine has 56 cores means CPU then your spark jobs will be partitioned in 56 part.

如果您的机器有 56 个核心意味着 CPU,那么您的 Spark 作业将被分成 56 个部分。

NOTE:- there may be the case that in your spark cluster the spark-defaults.conffile has limited the partition value with the default value (like 10 or else) then your partitioned will be the same as default value has been set in config.

注意:- 可能存在这样的情况,在您的 Spark 集群中,spark-defaults.conf文件将分区值限制为默认值(如 10 或其他),那么您的分区将与在配置中设置的默认值相同.

local[N, maxFailures] (called local-with-retries) with N being * or the number of threads to use (as explained above) and maxFailures being the value of spark.task.maxFailures.

local[N, maxFailures](称为 local-with-retries),其中 N 是 * 或要使用的线程数(如上所述),maxFailures 是 spark.task.maxFailures 的值。