Python 为什么 PySpark 中的 agg() 一次只能汇总一列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44384102/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:58:09  来源:igfitidea点击:

Why agg() in PySpark is only able to summarize one column at a time?

pythonapache-sparkpysparkapache-spark-sqlpyspark-sql

提问by GeorgeOfTheRF

For the below dataframe

对于下面的数据框

df=spark.createDataFrame(data=[('Alice',4.300),('Bob',7.677)],schema=['name','High'])

When I try to find min & max I am only getting min value in output.

当我尝试找到 min 和 max 时,我只得到输出中的最小值。

df.agg({'High':'max','High':'min'}).show()
+-----------+
|min(High)  |
+-----------+
|    2094900|
+-----------+

Why can't agg() give both max & min like in Pandas?

为什么 agg() 不能像 Pandas 那样同时给出最大值和最小值?

回答by titiro89

As you can see here:

正如你在这里看到的:

agg(*exprs)

Compute aggregates and returns the result as a DataFrame.

The available aggregate functions are avg, max, min, sum, count.

If exprs is a single dict mapping from string to string, then the key is the column to perform aggregation on, and the value is the aggregate function.

Alternatively, exprs can also be a list of aggregate Column expressions.

Parameters: exprs– a dict mapping from column name (string) to aggregate functions (string), or a list of Column.

agg(*exprs)

Compute 聚合并将结果作为 DataFrame 返回。

可用的聚合函数有 avg、max、min、sum、count。

如果 exprs 是从字符串到字符串的单个 dict 映射,则键是要对其执行聚合的列,值是聚合函数。

或者, exprs 也可以是聚合列表达式的列表。

参数: exprs– 从列名(字符串)到聚合函数(字符串)的字典映射,或列的列表。

You can use a list of column and apply the function that you need on every column, like this:

您可以使用列列表并在每一列上应用您需要的功能,如下所示:

>>> from pyspark.sql import functions as F

>>> df.agg(F.min(df.High),F.max(df.High),F.avg(df.High),F.sum(df.High)).show()
+---------+---------+---------+---------+
|min(High)|max(High)|avg(High)|sum(High)|
+---------+---------+---------+---------+
|      4.3|    7.677|   5.9885|   11.977|
+---------+---------+---------+---------+