Python Spark DataFrame groupBy 并按降序排序 (pyspark)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34514545/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Spark DataFrame groupBy and sort in the descending order (pyspark)
提问by rclakmal
I'm using pyspark(Python 2.7.9/Spark 1.3.1) and have a dataframe GroupObject which I need to filter & sort in the descending order. Trying to achieve it via this piece of code.
我正在使用 pyspark(Python 2.7.9/Spark 1.3.1) 并有一个数据框 GroupObject,我需要按降序过滤和排序。试图通过这段代码来实现它。
group_by_dataframe.count().filter("`count` >= 10").sort('count', ascending=False)
But it throws the following error.
但它会引发以下错误。
sort() got an unexpected keyword argument 'ascending'
采纳答案by zero323
In PySpark 1.3 sort
method doesn't take ascending parameter. You can use desc
method instead:
在 PySpark 1.3sort
方法中不采用升序参数。您可以改用desc
方法:
from pyspark.sql.functions import col
(group_by_dataframe
.count()
.filter("`count` >= 10")
.sort(col("count").desc()))
or desc
function:
或desc
功能:
from pyspark.sql.functions import desc
(group_by_dataframe
.count()
.filter("`count` >= 10")
.sort(desc("count"))
Both methods can be used with with Spark >= 1.3 (including Spark 2.x).
这两种方法都可以与 Spark >= 1.3(包括 Spark 2.x)一起使用。
回答by Henrique Florêncio
Use orderBy:
使用orderBy:
group_by_dataframe.count().filter("`count` >= 10").orderBy('count', ascending=False)
http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html
http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html
回答by Narendra Maru
you can use groupBy and orderBy as follows also
您也可以使用 groupBy 和 orderBy 如下
dataFrameWay = df.groupBy("firstName").count().withColumnRenamed("count","distinct_name").sort(desc("count"))
回答by gdoron is supporting Monica
By far the most convenient way is using this:
到目前为止,最方便的方法是使用这个:
df.orderBy(df.column_name.desc())
Doesn't require special imports.
不需要特殊的进口。
回答by Prabhath Kota
In pyspark 2.4.4
在 pyspark 2.4.4 中
1) group_by_dataframe.count().filter("`count` >= 10").orderBy('count', ascending=False)
2) from pyspark.sql.functions import desc
group_by_dataframe.count().filter("`count` >= 10").orderBy('count').sort(desc('count'))
No need to import in 1) and 1) is short & easy to read,
So I prefer 1) over 2)
无需导入 1) 和 1) 简短易读,
所以我更喜欢 1) 而不是 2)