java 如何在spark java中实现按值排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29003246/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 14:32:24  来源:igfitidea点击:

How to achieve sort by value in spark java

javasortingapache-spark

提问by Subramanyam S

JavaPairRDD<String, Float> counts = ones
            .reduceByKey(new Function2<Float, Float, Float>() {
                @Override
                public Float call(Float i1, Float i2) {
                    return i1 + i2;
                }
            });

My output looks like this:

我的输出如下所示:

id,value
100002,23.47
100003,42.78
200003,50.45
190001,30.23

I would like the output to be sorted by value like:

我希望输出按值排序,例如:

200003,50.45
100003,42.78
190001,30.23
100002,23.47

How do I achieve this?

我如何实现这一目标?

回答by Daniel Langdon

Scala has a nice sortBymethod. Could not find the Java equivalent, but this is the scala implementation:

Scala 有一个很好的sortBy方法。找不到 Java 等效项,但这是 Scala 实现:

  def sortBy[K](
      f: (T) => K,
      ascending: Boolean = true,
      numPartitions: Int = this.partitions.size)
      (implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T] =
    this.keyBy[K](f)
        .sortByKey(ascending, numPartitions)
        .values

So, basically similar to the above, but it add a key instead of swapping forward and backwards. I use it like this: .sortBy(_._2)(sort by picking the second element of the tuple).

所以,基本上类似于上面的,但它添加了一个键而不是向前和向后交换。我这样使用它:(.sortBy(_._2)通过选择元组的第二个元素进行排序)。

回答by Ramana

I think there is no specific API to sort the data on value.

我认为没有特定的 API 可以对数据进行排序。

May be you need to do below steps:

可能您需要执行以下步骤:

1) Swap key and value
2) Use sortByKey API
3) Swap key and value

1) 交换键和值
2) 使用 sortByKey API
3) 交换键和值

Look at the more details about sortByKey in beloe reference:
https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/api/java/JavaPairRDD.html#sortByKey%28boolean%29

在 beloe 参考中查看有关 sortByKey 的更多详细信息:https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/api/java/JavaPairRDD.html#sortByKey%28boolean%29

for swap, we can use Scala Tuple API:

对于交换,我们可以使用 Scala Tuple API:

http://www.scala-lang.org/api/current/index.html#scala.Tuple2

http://www.scala-lang.org/api/current/index.html#scala.Tuple2

For example, I have Java Pair RDD from the below function.

例如,我有来自以下函数的 Java Pair RDD。

JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() {
          @Override
          public Integer call(Integer i1, Integer i2) {
            return i1 + i2;
          }
  });

Now, To swap key and value, you can use below code:

现在,要交换键和值,您可以使用以下代码:

JavaPairRDD<Integer, String> swappedPair = counts.mapToPair(new PairFunction<Tuple2<String, Integer>, Integer, String>() {
           @Override
           public Tuple2<Integer, String> call(Tuple2<String, Integer> item) throws Exception {
               return item.swap();
           }

        });

Hope this helps. You need to take care of the data types.

希望这可以帮助。您需要注意数据类型。