SQL 为什么我们需要 GROUP BY 和聚合函数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13998552/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 12:46:48  来源:igfitidea点击:

Why do we need GROUP BY with AGGREGATE FUNCTIONS?

sql

提问by david blaine

I saw an example where there was a list (table) of employees with their respective monthly salaries. I did a sum of the salaries and saw the exact same table in the ouptput !!! That was strange.

我看到了一个例子,其中有一个员工列表(表格)以及他们各自的月薪。我计算了工资总额,并在输出中看到了完全相同的表格!!!那很奇怪。

Here is what has to be done - we have to find out how much money we pay this month as employee salaries. For that, we need to sum their salary amounts in the database as shown -

这是必须做的——我们必须找出我们这个月支付的员工工资有多少。为此,我们需要将他们在数据库中的工资金额相加,如下所示 -

SELECT EmployeeID, SUM (MonthlySalary) 
FROM Employee
GROUP BY EmpID

I know that I get an error if I don't use group by in the above code. This is what i don't understand -

我知道如果我在上面的代码中不使用 group by 会出错。这是我不明白的-

We are selecting employeeid from the employee table. SUM() is being told that it has to add the MonthlySalary column, from the Employee table. So, it should directly go and add those numbers up instead of grouping them and then adding them.

我们正在从员工表中选择员工 ID。SUM() 被告知它必须从 Employee 表中添加 MonthlySalary 列。因此,它应该直接将这些数字相加,而不是将它们分组然后相加。

Thats how a person would do it - look at employee table and add all the numbers. Why would he take the trouble to group them and then add them up ?

这就是一个人会怎么做 - 查看员工表并添加所有数字。他为什么要费心把它们分组,然后把它们加起来?

回答by Abe Miessler

It might be easier if you think of GROUP BY as "for each" for the sake of explanation. The query below:

如果为了解释起见,您将 GROUP BY 视为“for each”可能会更容易。查询如下:

SELECT empid, SUM (MonthlySalary) 
FROM Employee
GROUP BY EmpID

is saying:

是说:

"Give me the sum of MonthlySalary's for eachempid"

“给我每个empid的 MonthlySalary 的总和”

So if your table looked like this:

所以如果你的表看起来像这样:

+-----+------------+
|empid|MontlySalary|
+-----+------------+
|1    |200         |
+-----+------------+
|2    |300         |
+-----+------------+

result:

结果:

+-+---+
|1|200|
+-+---+
|2|300|
+-+---+

Sum wouldn't appear to do anything because the sum of one number is that number. On the other hand if it looked like this:

Sum 似乎不会做任何事情,因为一个数字的总和就是那个数字。另一方面,如果它看起来像这样:

+-----+------------+
|empid|MontlySalary|
+-----+------------+
|1    |200         |
+-----+------------+
|1    |300         |
+-----+------------+
|2    |300         |
+-----+------------+

result:

结果:

+-+---+
|1|500|
+-+---+
|2|300|
+-+---+

Then it would because there are two empid 1's to sum together. Not sure if this explanation helps or not, but I hope it makes things a little clearer.

那么它会因为有两个 empid 1 相加在一起。不确定这个解释是否有帮助,但我希望它能让事情更清楚一些。

回答by Mark Byers

If you wanted to add up all the numbers you would not have a GROUP BY:

如果你想把所有的数字相加,你就没有 GROUP BY:


SELECT SUM(MonthlySalary) AS TotalSalary
FROM Employee
+-----------+
|TotalSalary|
+-----------+
|777400     |
+-----------+

The point of the GROUP BY is that you get a separate total for each employee.

GROUP BY 的重点是您为每个员工获得单独的总数。

+--------+------+
|Employee|Salary|
+--------+------+
|John    |123400|
+--------+------+
|Frank   |413000|
+--------+------+
|Bill    |241000|
+--------+------+

回答by Gordon Linoff

The sad thing is that there is one database that supports the syntax you are suggesting:

可悲的是,有一个数据库支持您建议的语法:

SELECT EmployeeID, SUM (MonthlySalary) 
FROM Employee

However, MySQL does notdo what you expect. It returns the overall sum of the MonthlySalary for everyone, and one arbitrary EmployeeId. Alas.

但是,MySQL 并没有按照您的预期执行。它返回每个人的 MonthlySalary 的总和,以及一个任意的 EmployeeId。唉。

Your question is about SQL syntax. The answer is that is how SQL has been defined, and it is not going to change. Determining the aggregation fields from the SELECTclause is not unreasonable, but it is not how this language is defined.

您的问题是关于 SQL 语法的。答案是 SQL 是如何定义的,并且不会改变。从SELECT子句中确定聚合字段并非不合理,但这不是这种语言的定义方式。

I do, however, have some sympathy for the question. Many people learning SQL think of "grouping" as something done in the context of sorting the rows. Something like "sort the cities in the US and group them by state in the output". Makes sense. But "group by" in SQL really means "summarize by" not "keep together".

不过,我对这个问题有一些同情。许多学习 SQL 的人认为“分组”是在对行进行排序的上下文中完成的。类似于“对美国的城市进行排序并在输出中按州对它们进行分组”。说得通。但是 SQL 中的“分组依据”实际上意味着“汇总依据”而不是“保持在一起”。

回答by Barmar

If you don't specify GROUP BY, aggregate functions operate over all the records selected. In that case, it doesn't make sense to also select a specific column like EmployeeID. Either you want per-employee totals, in which case you select the employee ID and group by employee, or you want a total across the entire table, so you leave out the employee ID and the GROUP BYclause.

如果未指定GROUP BY,聚合函数将对所有选定的记录进行操作。在这种情况下,选择特定列(如EmployeeID. 要么您想要每个员工的总计,在这种情况下您选择员工 ID 并按员工分组,要么您想要整个表的总计,因此您省略员工 ID 和GROUP BY子句。

In your query, if you leave out the GROUP BY, which employee ID would you like it to show?

在您的查询中,如果您省略GROUP BY,您希望它显示哪个员工 ID?