scala 如何在 Spark 窗口函数中以降序使用 orderby()?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38572888/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:29:52  来源:igfitidea点击:

How to use orderby() with descending order in Spark window functions?

scalaapache-sparkapache-spark-sqlspark-dataframe

提问by Malte

I need a window function that partitions by some keys (=column names), orders by another column name and returns the rows with top x ranks.

我需要一个窗口函数,它按某些键(=列名)进行分区,按另一个列名排序并返回排名最高的行。

This works fine for ascending order:

这适用于升序:

def getTopX(df: DataFrame, top_x: String, top_key: String, top_value:String): DataFrame ={
    val top_keys: List[String] = top_key.split(", ").map(_.trim).toList
    val w = Window.partitionBy(top_keys(1),top_keys.drop(1):_*)
       .orderBy(top_value)
    val rankCondition = "rn < "+top_x.toString
    val dfTop = df.withColumn("rn",row_number().over(w))
      .where(rankCondition).drop("rn")
  return dfTop
}

But when I try to change it to orderBy(desc(top_value))or orderBy(top_value.desc)in line 4, I get a syntax error. What's the correct syntax here?

但是当我尝试将其更改为orderBy(desc(top_value))orderBy(top_value.desc)在第 4 行中时,出现语法错误。这里的正确语法是什么?

回答by Sim

There are two versions of orderBy, one that works with strings and one that works with Columnobjects (API). Your code is using the first version, which does not allow for changing the sort order. You need to switch to the column version and then call the descmethod, e.g., myCol.desc.

有两种版本orderBy,一种适用于字符串,另一种适用于Column对象 ( API)。您的代码使用的是第一个版本,该版本不允许更改排序顺序。您需要切换到列版本,然后调用该desc方法,例如,myCol.desc

Now, we get into API design territory. The advantage of passing Columnparameters is that you have a lot more flexibility, e.g., you can use expressions, etc. If you want to maintain an API that takes in a string as opposed to a Column, you need to convert the string to a column. There are a number of ways to do this and the easiest is to use org.apache.spark.sql.functions.col(myColName).

现在,我们进入 API 设计领域。传递Column参数的好处是你有更多的灵活性,例如,你可以使用表达式等。如果你想维护一个接受字符串而不是 a 的 API Column,你需要将字符串转换为列。有很多方法可以做到这一点,最简单的方法是使用org.apache.spark.sql.functions.col(myColName).

Putting it all together, we get

把它们放在一起,我们得到

.orderBy(org.apache.spark.sql.functions.col(top_value).desc)

回答by Sarath Avanavu

Say for example, if we need to order by a column called Datein descending order in the Window function, use the $symbol before the column name which will enable us to use the ascor descsyntax.

例如,如果我们需要Date在 Window 函数中按降序调用的列进行排序,请$在列名之前使用符号,这将使我们能够使用ascordesc语法。

Window.orderBy($"Date".desc)

After specifying the column name in double quotes, give .descwhich will sort in descending order.

在双引号中指定列名后,给出.desc将按降序排序。

回答by GPopat

Column

柱子

col = new Column("ts")
col = col.desc()
WindowSpec w = Window.partitionBy("col1", "col2").orderBy(col)