scala 如何对apache spark scala中多列的数据进行排序?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36717510/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to sort the data on multiple columns in apache spark scala?
提问by Niranjanp
I have data set like this which I am taking from csv file and converting it into RDD using scala.
我有这样的数据集,我从 csv 文件中获取并使用 scala 将其转换为 RDD。
+-----------+-----------+----------+
| recent | Freq | Monitor |
+-----------+-----------+----------+
| 1 | 1234 | 199090|
| 4 | 2553| 198613|
| 6 | 3232 | 199090|
| 1 | 8823 | 498831|
| 7 | 2902 | 890000|
| 8 | 7991 | 081097|
| 9 | 7391 | 432370|
| 12 | 6138 | 864981|
| 7 | 6812 | 749821|
+-----------+-----------+----------+
How to sort the data on all columns ?
如何对所有列上的数据进行排序?
Thanks
谢谢
回答by Steve
Suppose your input RDD/DataFrame is called df.
假设您的输入 RDD/DataFrame 称为 df。
To sort recentin descending order, Freqand Monitorboth in ascending you can do:
要按recent降序排序,Freq并按Monitor升序排序,您可以执行以下操作:
import org.apache.spark.sql.functions._
val sorted = df.sort(desc("recent"), asc("Freq"), asc("Monitor"))
You can use df.orderBy(...)as well, it's an alias of sort().
您也可以使用df.orderBy(...),它是sort().
回答by Zahiro Mor
csv.sortBy(r => (r.recent, r.freq))or equivalent should do it
csv.sortBy(r => (r.recent, r.freq))或等效的应该这样做

