scala 如何在Spark SQL中按列降序排序?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30332619/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to sort by column in descending order in Spark SQL?
提问by Vedom
I tried df.orderBy("col1").show(10)but it sorted in ascending order. df.sort("col1").show(10)also sorts in descending order. I looked on stackoverflow and the answers I found were all outdated or referred to RDDs. I'd like to use the native dataframe in spark.
我试过了,df.orderBy("col1").show(10)但它是按升序排序的。df.sort("col1").show(10)也按降序排序。我查看了 stackoverflow,发现的答案都已过时或提到 RDDs。我想在 spark 中使用本机数据框。
采纳答案by Vedom
It's in org.apache.spark.sql.DataFramefor sortmethod:
它在org.apache.spark.sql.DataFrameforsort方法中:
df.sort($"col1", $"col2".desc)
Note $and .descinside sortfor the column to sort the results by.
注意$和.desc内部sort用于对结果进行排序的列。
回答by Gabber
You can also sort the column by importing the spark sql functions
您还可以通过导入 spark sql 函数对列进行排序
import org.apache.spark.sql.functions._
df.orderBy(asc("col1"))
Or
或者
import org.apache.spark.sql.functions._
df.sort(desc("col1"))
importing sqlContext.implicits._
导入 sqlContext.implicits._
import sqlContext.implicits._
df.orderBy($"col1".desc)
Or
或者
import sqlContext.implicits._
df.sort($"col1".desc)
回答by Nic Scozzaro
PySpark only
仅 PySpark
I came across this post when looking to do the same in PySpark. The easiest way is to just add the parameter ascending=False:
我在 PySpark 中做同样的事情时遇到了这篇文章。最简单的方法是只添加参数升序=假:
df.orderBy("col1", ascending=False).show(10)
Reference: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy
参考:http: //spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy
回答by Nitya Yekkirala
import org.apache.spark.sql.functions.desc
df.orderBy(desc("columnname1"),desc("columnname2"),asc("columnname3"))
回答by Nilesh Shinde
df.sort($"ColumnName".desc).show()
回答by RPaul
In the case of Java:
在 Java 的情况下:
If we use DataFrames, while applying joins (here Inner join), we can sort (in ASC) after selecting distinct elements in each DF as:
如果我们使用DataFrames,在应用连接(这里是内连接)时,我们可以在选择每个 DF 中的不同元素后(在 ASC 中)排序为:
Dataset<Row> d1 = e_data.distinct().join(s_data.distinct(), "e_id").orderBy("salary");
where e_idis the column on which join is applied while sorted by salary in ASC.
e_id在ASC中按薪水排序时应用连接的列在哪里。
Also, we can use Spark SQL as:
此外,我们可以将 Spark SQL 用作:
SQLContext sqlCtx = spark.sqlContext();
sqlCtx.sql("select * from global_temp.salary order by salary desc").show();
where
在哪里
- spark -> SparkSession
- salary -> GlobalTemp View.
- 火花-> SparkSession
- 工资 -> GlobalTemp 视图。

