scala 如何获得 PySpark 中的工人(执行者)数量?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38660907/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:31:25  来源:igfitidea点击:

How to get the number of workers(executors) in PySpark?

scalaapache-sparkpyspark

提问by American curl

I need to use this parameter, so how can I get the number of workers? Like in Scala, I can call sc.getExecutorMemoryStatusto get the available number of workers. But in PySpark, it seems there's no API exposed to get this number.

我需要使用这个参数,那么我怎样才能得到工人的数量呢?就像在 Scala 中一样,我可以调用sc.getExecutorMemoryStatus以获取可用的工作人员数量。但是在 PySpark 中,似乎没有公开的 API 来获得这个数字。

回答by Ram Ghadiyaram

In scala, getExecutorStorageStatusand getExecutorMemoryStatusboth return the number of executors including driver. like below example snippet

在Scala中,getExecutorStorageStatus并且getExecutorMemoryStatus都返回执行者包括驱动程序的数量。像下面的示例片段

/** Method that just returns the current active/registered executors
        * excluding the driver.
        * @param sc The spark context to retrieve registered executors.
        * @return a list of executors each in the form of host:port.
        */
       def currentActiveExecutors(sc: SparkContext): Seq[String] = {
         val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
         val driverHost: String = sc.getConf.get("spark.driver.host")
         allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
       }

But In python api it was not implemented

但是在python api中它没有实现

@DanielDarabos answeralso confirms this.

@DanielDarabos 的回答也证实了这一点。

The equivalent to this in python...

相当于python中的这个......

sc.getConf().get("spark.executor.instances")

Edit (python) :

编辑(蟒蛇):

which might be sc._conf.get('spark.executor.instances')

这可能是 sc._conf.get('spark.executor.instances')