Python 为什么 PySpark 中的 agg() 一次只能汇总一列？

Question

提问by GeorgeOfTheRF

For the below dataframe

对于下面的数据框

df=spark.createDataFrame(data=[('Alice',4.300),('Bob',7.677)],schema=['name','High'])

When I try to find min & max I am only getting min value in output.

当我尝试找到 min 和 max 时，我只得到输出中的最小值。

df.agg({'High':'max','High':'min'}).show()

+-----------+
|min(High)  |
+-----------+
|    2094900|
+-----------+

Why can't agg() give both max & min like in Pandas?

为什么 agg() 不能像 Pandas 那样同时给出最大值和最小值？

Answer 1

回答by titiro89

As you can see here:

正如你在这里看到的：

agg(*exprs)
Compute aggregates and returns the result as a DataFrame.
The available aggregate functions are avg, max, min, sum, count.
If exprs is a single dict mapping from string to string, then the key is the column to perform aggregation on, and the value is the aggregate function.
Alternatively, exprs can also be a list of aggregate Column expressions.
Parameters: exprs– a dict mapping from column name (string) to aggregate functions (string), or a list of Column.

agg(*exprs)
Compute 聚合并将结果作为 DataFrame 返回。
可用的聚合函数有 avg、max、min、sum、count。
如果 exprs 是从字符串到字符串的单个 dict 映射，则键是要对其执行聚合的列，值是聚合函数。
或者， exprs 也可以是聚合列表达式的列表。
参数： exprs– 从列名（字符串）到聚合函数（字符串）的字典映射，或列的列表。

You can use a list of column and apply the function that you need on every column, like this:

您可以使用列列表并在每一列上应用您需要的功能，如下所示：

>>> from pyspark.sql import functions as F

>>> df.agg(F.min(df.High),F.max(df.High),F.avg(df.High),F.sum(df.High)).show()
+---------+---------+---------+---------+
|min(High)|max(High)|avg(High)|sum(High)|
+---------+---------+---------+---------+
|      4.3|    7.677|   5.9885|   11.977|
+---------+---------+---------+---------+

Python 为什么 PySpark 中的 agg() 一次只能汇总一列？

提问by GeorgeOfTheRF

回答by titiro89

相关推荐

最近更新

标签

Python 为什么 PySpark 中的 agg() 一次只能汇总一列？

提问by GeorgeOfTheRF

回答by titiro89

相关推荐

Python 如何在 Keras 模型中初始化偏差？

从python中的数据帧的行中获取最大值

Python 如何抑制 py.test 内部弃用警告

Python 为什么我从 grangercausalitytests 得到“LinAlgError: Singular matrix”？

相关推荐

最近更新

标签