Python 在 pyspark RDD 上显示分区

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29056079/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:02:12  来源:igfitidea点击:

Show partitions on a pyspark RDD

pythonapache-sparkpyspark

提问by javadba

The pyspark RDD documentation

pyspark RDD 文档

http://spark.apache.org/docs/1.2.1/api/python/pyspark.html#pyspark.RDD

http://spark.apache.org/docs/1.2.1/api/python/pyspark.html#pyspark.RDD

does not show any method(s) to display partition information for an RDD.

没有显示任何方法来显示 RDD 的分区信息。

Is there any way to get that information without executing an additional step e.g.:

有没有办法在不执行额外步骤的情况下获取该信息,例如:

myrdd.mapPartitions(lambda x: iter[1]).sum()

The above does work .. but seems like extra effort.

以上确实有效..但似乎需要额外的努力。

采纳答案by javadba

I missed it: very simple:

我错过了:非常简单:

rdd.getNumPartitions()

Not used to the java-ish getFooMethod() anymore ;)

不再习惯 java-ish getFooMethod() 了;)

Update: Adding in the comment from @dnlbrky :

更新:添加来自@dnlbrky 的评论:

dataFrame.rdd.getNumPartitions()