scala 火花对RDD中的按值排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26970001/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Order by value in spark pair RDD
提问by Vijay Innamuri
I have a spark pair RDD (key, count) as below
我有一个火花对 RDD(键,计数)如下
Array[(String, Int)] = Array((a,1), (b,2), (c,1), (d,3))
Using spark scala API how to get a new pair RDD which is sorted by value?
使用spark scala API如何获得按值排序的新对RDD?
Required result: Array((d,3), (b,2), (a,1), (c,1))
要求的结果: Array((d,3), (b,2), (a,1), (c,1))
回答by Gábor Bakos
This should work:
这应该有效:
//Assuming the pair's second type has an Ordering, which is the case for Int
rdd.sortBy(_._2) // same as rdd.sortBy(pair => pair._2)
(Though you might want to take the key to account too when there are ties.)
(尽管当有关系时,您可能也想考虑密钥。)
回答by Nagaraj Vittal
Sort by key and value in ascending and descending order
按键和值按升序和降序排序
val textfile = sc.textFile("file:///home/hdfs/input.txt")
val words = textfile.flatMap(line => line.split(" "))
//Sort by value in descending order. For ascending order remove 'false' argument from sortBy
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2,false)
//for ascending order by value
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2)
//Sort by key in ascending order
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey
//Sort by key in descending order
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey(false)
This can be done in another way by applying sortByKey after swapping the key and value
这可以通过在交换键和值后应用 sortByKey 以另一种方式完成
//Sort By value by swapping key and value and then using sortByKey
val sortbyvalue = words.map( word => (word,1)).reduceByKey((a,b) => a+b)
val descendingSortByvalue = sortbyvalue.map(x => (x._2,x._1)).sortByKey(false)
descendingSortByvalue.toDF.show
descendingSortByvalue.foreach {n => {
val word= n._1
val count = n._2
println(s"$word:$count")}}

