scala 如何在Spark SQL中按列降序排序?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30332619/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:10:15  来源:igfitidea点击:

How to sort by column in descending order in Spark SQL?

scalaapache-sparkapache-spark-sql

提问by Vedom

I tried df.orderBy("col1").show(10)but it sorted in ascending order. df.sort("col1").show(10)also sorts in descending order. I looked on stackoverflow and the answers I found were all outdated or referred to RDDs. I'd like to use the native dataframe in spark.

我试过了,df.orderBy("col1").show(10)但它是按升序排序的。df.sort("col1").show(10)也按降序排序。我查看了 stackoverflow,发现的答案都已过时或提到 RDDs。我想在 spark 中使用本机数据框。

采纳答案by Vedom

It's in org.apache.spark.sql.DataFramefor sortmethod:

它在org.apache.spark.sql.DataFrameforsort方法中:

df.sort($"col1", $"col2".desc)

Note $and .descinside sortfor the column to sort the results by.

注意$.desc内部sort用于对结果进行排序的列。

回答by Gabber

You can also sort the column by importing the spark sql functions

您还可以通过导入 spark sql 函数对列进行排序

import org.apache.spark.sql.functions._
df.orderBy(asc("col1"))

Or

或者

import org.apache.spark.sql.functions._
df.sort(desc("col1"))

importing sqlContext.implicits._

导入 sqlContext.implicits._

import sqlContext.implicits._
df.orderBy($"col1".desc)

Or

或者

import sqlContext.implicits._
df.sort($"col1".desc)

回答by Nic Scozzaro

PySpark only

仅 PySpark

I came across this post when looking to do the same in PySpark. The easiest way is to just add the parameter ascending=False:

我在 PySpark 中做同样的事情时遇到了这篇文章。最简单的方法是只添加参数升序=假:

df.orderBy("col1", ascending=False).show(10)

Reference: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy

参考:http: //spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy

回答by Nitya Yekkirala

import org.apache.spark.sql.functions.desc

df.orderBy(desc("columnname1"),desc("columnname2"),asc("columnname3"))

回答by Nilesh Shinde

df.sort($"ColumnName".desc).show()

回答by RPaul

In the case of Java:

在 Java 的情况下:

If we use DataFrames, while applying joins (here Inner join), we can sort (in ASC) after selecting distinct elements in each DF as:

如果我们使用DataFrames,在应用连接(这里是内连接)时,我们可以在选择每个 DF 中的不同元素后(在 ASC 中)排序为:

Dataset<Row> d1 = e_data.distinct().join(s_data.distinct(), "e_id").orderBy("salary");

where e_idis the column on which join is applied while sorted by salary in ASC.

e_id在ASC中按薪水排序时应用连接的列在哪里。

Also, we can use Spark SQL as:

此外,我们可以将 Spark SQL 用作:

SQLContext sqlCtx = spark.sqlContext();
sqlCtx.sql("select * from global_temp.salary order by salary desc").show();

where

在哪里

  • spark  -> SparkSession
  • salary -> GlobalTemp View.
  • 火花-> SparkSession
  • 工资 -> GlobalTemp 视图。