scala 如何在 Spark 中打印特定 RDD 分区的元素？

Question

提问by Arnav

How to print the elements of a particular partition, say 5th, alone?

如何单独打印特定分区的元素，例如第 5 个？

val distData = sc.parallelize(1 to 50, 10)

Answer 1

采纳答案by Fabio Fantoni

Using Spark/Scala:

使用 Spark/Scala：

val data = 1 to 50
val distData = sc.parallelize(data,10)
distData.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) =>it.toList.map(x => if (index ==5) {println(x)}).iterator).collect

produces:

产生：

Answer 2

回答by urug

you could possible use a counter against foreachPartition() API to achieve it.

您可以使用针对 foreachPartition() API 的计数器来实现它。

Here is a Java program that prints content of each partition JavaSparkContext context = new JavaSparkContext(conf);

这是一个打印每个分区内容的 Java 程序 JavaSparkContext context = new JavaSparkContext(conf);

    JavaRDD<Integer> myArray = context.parallelize(Arrays.asList(1,2,3,4,5,6,7,8,9));
    JavaRDD<Integer> partitionedArray = myArray.repartition(2);

    System.out.println("partitioned array size is " + partitionedArray.count());
    partitionedArray.foreachPartition(new VoidFunction<Iterator<Integer>>() {

        public void call(Iterator<Integer> arg0) throws Exception {

            while(arg0.hasNext()) {
                System.out.println(arg0.next());
            }

        }
    });

Answer 3

回答by Dichen

Assume you do this just for test purpose, then use glom(). See Spark documentation: https://spark.apache.org/docs/1.6.0/api/python/pyspark.html#pyspark.RDD.glom

假设您这样做只是为了测试目的，然后使用 glom()。请参阅 Spark 文档：https: //spark.apache.org/docs/1.6.0/api/python/pyspark.html#pyspark.RDD.glom

>>> rdd = sc.parallelize([1, 2, 3, 4], 2)
>>> rdd.glom().collect()
[[1, 2], [3, 4]]
>>> rdd.glom().collect()[1]
[3, 4]

Edit: Example in Scala:

编辑：Scala 中的示例：

scala> val distData = sc.parallelize(1 to 50, 10)
scala> distData.glom().collect()(4)
res2: Array[Int] = Array(21, 22, 23, 24, 25)

scala 如何在 Spark 中打印特定 RDD 分区的元素？

提问by Arnav

采纳答案by Fabio Fantoni

回答by urug

回答by Dichen

相关推荐

最近更新

标签

scala 如何在 Spark 中打印特定 RDD 分区的元素？

提问by Arnav

采纳答案by Fabio Fantoni

回答by urug

回答by Dichen

相关推荐

scala 如何将 Map 中的键转换为小写？

scala 如何更改 Spark SQL 的 DataFrame 中的列类型？

scala 如何在spark中将rdd对象转换为数据帧

scala 如何在 Spark 中转置 RDD

相关推荐

最近更新

标签