scala 如何对apache spark scala中多列的数据进行排序?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36717510/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:11:30  来源:igfitidea点击:

How to sort the data on multiple columns in apache spark scala?

scalaapache-spark

提问by Niranjanp

I have data set like this which I am taking from csv file and converting it into RDD using scala.

我有这样的数据集,我从 csv 文件中获取并使用 scala 将其转换为 RDD。

+-----------+-----------+----------+
| recent    | Freq      | Monitor  |
+-----------+-----------+----------+
|        1  |       1234 |   199090|
|        4  |       2553|    198613|
|        6  |       3232 |   199090|
|        1  |       8823 |   498831|
|        7  |       2902 |   890000|
|        8  |       7991 |   081097|
|        9  |       7391 |   432370|
|        12 |       6138 |   864981|
|        7  |       6812 |   749821|
+-----------+-----------+----------+

How to sort the data on all columns ?

如何对所有列上的数据进行排序?

Thanks

谢谢

回答by Steve

Suppose your input RDD/DataFrame is called df.

假设您的输入 RDD/DataFrame 称为 df。

To sort recentin descending order, Freqand Monitorboth in ascending you can do:

要按recent降序排序,Freq并按Monitor升序排序,您可以执行以下操作:

import org.apache.spark.sql.functions._

val sorted = df.sort(desc("recent"), asc("Freq"), asc("Monitor"))

You can use df.orderBy(...)as well, it's an alias of sort().

您也可以使用df.orderBy(...),它是sort().

回答by Zahiro Mor

csv.sortBy(r => (r.recent, r.freq))or equivalent should do it

csv.sortBy(r => (r.recent, r.freq))或等效的应该这样做