scala 如何获得 PySpark 中的工人(执行者)数量?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38660907/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get the number of workers(executors) in PySpark?
提问by American curl
I need to use this parameter, so how can I get the number of workers?
Like in Scala, I can call sc.getExecutorMemoryStatusto get the available number of workers. But in PySpark, it seems there's no API exposed to get this number.
我需要使用这个参数,那么我怎样才能得到工人的数量呢?就像在 Scala 中一样,我可以调用sc.getExecutorMemoryStatus以获取可用的工作人员数量。但是在 PySpark 中,似乎没有公开的 API 来获得这个数字。
回答by Ram Ghadiyaram
In scala, getExecutorStorageStatusand getExecutorMemoryStatusboth return the number of executors including driver.
like below example snippet
在Scala中,getExecutorStorageStatus并且getExecutorMemoryStatus都返回执行者包括驱动程序的数量。像下面的示例片段
/** Method that just returns the current active/registered executors
* excluding the driver.
* @param sc The spark context to retrieve registered executors.
* @return a list of executors each in the form of host:port.
*/
def currentActiveExecutors(sc: SparkContext): Seq[String] = {
val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
val driverHost: String = sc.getConf.get("spark.driver.host")
allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
}
But In python api it was not implemented
@DanielDarabos answeralso confirms this.
@DanielDarabos 的回答也证实了这一点。
The equivalent to this in python...
相当于python中的这个......
sc.getConf().get("spark.executor.instances")
Edit (python) :
编辑(蟒蛇):
which might be sc._conf.get('spark.executor.instances')
这可能是 sc._conf.get('spark.executor.instances')

