java 如何在spark java中实现按值排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29003246/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to achieve sort by value in spark java
提问by Subramanyam S
JavaPairRDD<String, Float> counts = ones
.reduceByKey(new Function2<Float, Float, Float>() {
@Override
public Float call(Float i1, Float i2) {
return i1 + i2;
}
});
My output looks like this:
我的输出如下所示:
id,value
100002,23.47
100003,42.78
200003,50.45
190001,30.23
I would like the output to be sorted by value like:
我希望输出按值排序,例如:
200003,50.45
100003,42.78
190001,30.23
100002,23.47
How do I achieve this?
我如何实现这一目标?
回答by Daniel Langdon
Scala has a nice sortBy
method. Could not find the Java equivalent, but this is the scala implementation:
Scala 有一个很好的sortBy
方法。找不到 Java 等效项,但这是 Scala 实现:
def sortBy[K](
f: (T) => K,
ascending: Boolean = true,
numPartitions: Int = this.partitions.size)
(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T] =
this.keyBy[K](f)
.sortByKey(ascending, numPartitions)
.values
So, basically similar to the above, but it add a key instead of swapping forward and backwards. I use it like this: .sortBy(_._2)
(sort by picking the second element of the tuple).
所以,基本上类似于上面的,但它添加了一个键而不是向前和向后交换。我这样使用它:(.sortBy(_._2)
通过选择元组的第二个元素进行排序)。
回答by Ramana
I think there is no specific API to sort the data on value.
我认为没有特定的 API 可以对数据进行排序。
May be you need to do below steps:
可能您需要执行以下步骤:
1) Swap key and value
2) Use sortByKey API
3) Swap key and value
1) 交换键和值
2) 使用 sortByKey API
3) 交换键和值
Look at the more details about sortByKey in beloe reference:
https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/api/java/JavaPairRDD.html#sortByKey%28boolean%29
在 beloe 参考中查看有关 sortByKey 的更多详细信息:https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/api/java/JavaPairRDD.html#sortByKey%28boolean%29
for swap, we can use Scala Tuple API:
对于交换,我们可以使用 Scala Tuple API:
http://www.scala-lang.org/api/current/index.html#scala.Tuple2
http://www.scala-lang.org/api/current/index.html#scala.Tuple2
For example, I have Java Pair RDD from the below function.
例如,我有来自以下函数的 Java Pair RDD。
JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer i1, Integer i2) {
return i1 + i2;
}
});
Now, To swap key and value, you can use below code:
现在,要交换键和值,您可以使用以下代码:
JavaPairRDD<Integer, String> swappedPair = counts.mapToPair(new PairFunction<Tuple2<String, Integer>, Integer, String>() {
@Override
public Tuple2<Integer, String> call(Tuple2<String, Integer> item) throws Exception {
return item.swap();
}
});
Hope this helps. You need to take care of the data types.
希望这可以帮助。您需要注意数据类型。