SQL 使用 group by 和 have 子句
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16327954/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using group by and having clause
提问by user2341124
Using the following schema:
使用以下架构:
Supplier (sid, name, status, city)
Part (pid, name, color, weight, city)
Project (jid, name, city)
Supplies (sid, pid, jid**, quantity)
Get supplier numbers and names for suppliers of parts supplied to at least two different projects.
Get supplier numbers and names for suppliers of the same part to at least two different projects.
获取供应给至少两个不同项目的零件的供应商编号和名称。
获取至少两个不同项目的同一部件的供应商的供应商编号和名称。
These were my answers:
这些是我的回答:
1.
1.
SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2
2.
2.
SELECT s.sid, s.name
FROM Suppliers s, Supplies su, Project pr, Part p
WHERE s.sid = su.sid AND su.pid = p.pid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid)>=2
Can anyone confirm if I wrote this correctly? I'm a little confused as to how the Group By and Having clause works
谁能确认我写的是否正确?我对 Group By 和 Have 子句的工作方式有些困惑
回答by dmg
The semantics of Having
拥有的语义
To better understand having, you need to see it from a theoretical point of view.
为了更好地理解拥有,你需要从理论的角度来看待它。
A group byis a query that takes a table and summarizes it into another table. You summarize the original table by grouping the original table into subsets (based upon the attributes that you specify in the group by). Each of these groups will yield one tuple.
甲按组是一个查询,需要一个表,并将其汇总成另一个表。您可以通过将原始表分组为子集来汇总原始表(基于您在分组依据中指定的属性)。这些组中的每一个都将产生一个元组。
The Havingis simply equivalent to a WHERE clause afterthe group by has executed and before the selectpart of the query is computed.
的有是简单地等同于WHERE子句后的组通过已执行,并且之前选择查询的一部分进行计算。
Lets say your query is:
假设您的查询是:
select a, b, count(*)
from Table
where c > 100
group by a, b
having count(*) > 10;
The evaluation of this query can be seen as the following steps:
这个查询的评估可以看作是以下步骤:
- Perform the WHERE, eliminating rows that do not satisfy it.
- Group the table into subsets based upon the values of a and b (each tuple in each subset has the same values of a and b).
- Eliminate subsets that do not satisfy the HAVING condition
- Process each subset outputting the values as indicated in the SELECT part of the query. This creates one output tuple per subset left after step 3.
- 执行 WHERE,消除不满足它的行。
- 根据 a 和 b 的值将表分组为子集(每个子集中的每个元组具有相同的 a 和 b 值)。
- 消除不满足 HAVING 条件的子集
- 处理每个子集,输出在查询的 SELECT 部分中指示的值。这将在步骤 3 之后为每个子集创建一个输出元组。
You can extend this to any complex query there Table can be any complex query that return a table (a cross product, a join, a UNION, etc).
您可以将其扩展到任何复杂的查询,表可以是任何返回表的复杂查询(叉积、联接、联合等)。
In fact, havingis syntactic sugarand does not extend the power of SQL. Any given query:
事实上,拥有只是语法糖,并没有扩展 SQL 的功能。任何给定的查询:
SELECT list
FROM table
GROUP BY attrList
HAVING condition;
can be rewritten as:
可以改写为:
SELECT list from (
SELECT listatt
FROM table
GROUP BY attrList) as Name
WHERE condition;
The listatt is a list that includes the GROUP BY attributes and the expressions used in list and condition. It might be necessary to name some expressions in this list (with AS). For instance, the example query above can be rewritten as:
listatt 是一个列表,其中包括 GROUP BY 属性以及列表和条件中使用的表达式。可能需要在此列表中命名一些表达式(使用 AS)。例如,上面的示例查询可以改写为:
select a, b, count
from (select a, b, count(*) as count
from Table
where c > 100
group by a, b) as someName
where count > 10;
The solution you need
您需要的解决方案
Your solution seems to be correct:
您的解决方案似乎是正确的:
SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2
You join the three tables, then using sid as a grouping attribute (sname is functionally dependent on it, so it does not have an impact on the number of groups, but you mustinclude it, otherwise it cannot be part of the select part of the statement). Then you are removing those that do not satisfy your condition: the satisfy pr.jid is >= 2
, which is that you wanted originally.
你join这三个表,然后使用sid作为分组属性(sname在功能上依赖于它,所以它对组数没有影响,但是你必须包含它,否则它不能成为select部分的一部分该声明)。然后您将删除那些不满足您的条件的:满足pr.jid is >= 2
,这是您最初想要的。
Best solution to your problem
您的问题的最佳解决方案
I personally prefer a simpler cleaner solution:
我个人更喜欢更简单的清洁解决方案:
- You need to only group by Supplies (sid, pid, jid**, quantity) to find the sid of those that supply at least to two projects.
- Then join it to the Suppliers table to get the supplier same.
- 您只需按 Supplies (sid、pid、jid**、quantity) 分组即可找到至少供应两个项目的那些的 sid。
- 然后将其加入到供应商表中以获取相同的供应商。
SELECT sid, sname from
(SELECT sid from supplies
GROUP BY sid, pid
HAVING count(DISTINCT jid) >= 2
) AS T1
NATURAL JOIN
Supliers;
It will also be faster to execute, because the join is only done when needed, not all the times.
执行速度也会更快,因为连接仅在需要时完成,而不是所有时间。
--dmg
--dmg
回答by Harry
Because we can not use Where clause with aggregate functions like count(),min(), sum()etc. so having clause came into existence to overcome this problem in sql. see example for having clause go through this link
因为我们不能使用WHERE子句与聚合函数像数(),MIN(),和()等,所以having子句应运而生,以克服在SQL这个问题。请参阅让子句通过此链接的示例
回答by plalx
First of all, you should use the JOIN
syntax rather than FROM table1, table2
, and you should always limit the grouping to as little fields as you need.
首先,您应该使用JOIN
语法而不是FROM table1, table2
,并且您应该始终将分组限制为您需要的尽可能少的字段。
Altought I haven't tested, your first query seems fine to me, but could be re-written as:
我还没有测试过,你的第一个查询对我来说似乎很好,但可以重写为:
SELECT s.sid, s.name
FROM
Supplier s
INNER JOIN (
SELECT su.sid
FROM Supplies su
GROUP BY su.sid
HAVING COUNT(DISTINCT su.jid) > 1
) g
ON g.sid = s.sid
Or simplified as:
或简化为:
SELECT sid, name
FROM Supplier s
WHERE (
SELECT COUNT(DISTINCT su.jid)
FROM Supplies su
WHERE su.sid = s.sid
) > 1
However, your second query seems wrong to me, because you should also GROUP BY pid
.
但是,您的第二个查询对我来说似乎是错误的,因为您还应该GROUP BY pid
.
SELECT s.sid, s.name
FROM
Supplier s
INNER JOIN (
SELECT su.sid
FROM Supplies su
GROUP BY su.sid, su.pid
HAVING COUNT(DISTINCT su.jid) > 1
) g
ON g.sid = s.sid
As you may have noticed in the query above, I used the INNER JOIN
syntax to perform the filtering, however it can be also written as:
正如您在上面的查询中可能已经注意到的那样,我使用了INNER JOIN
语法来执行过滤,但它也可以写为:
SELECT s.sid, s.name
FROM Supplier s
WHERE (
SELECT COUNT(DISTINCT su.jid)
FROM Supplies su
WHERE su.sid = s.sid
GROUP BY su.sid, su.pid
) > 1
回答by cbp
What type of sql database are using (MSSQL, Oracle etc)? I believe what you have written is correct.
使用什么类型的 sql 数据库(MSSQL、Oracle 等)?我相信你所写的是正确的。
You could also write the first query like this:
您也可以像这样编写第一个查询:
SELECT s.sid, s.name
FROM Supplier s
WHERE (SELECT COUNT(DISTINCT pr.jid)
FROM Supplies su, Projects pr
WHERE su.sid = s.sid
AND pr.jid = su.jid) >= 2
It's a little more readable, and less mind-bending than trying to do it with GROUP BY. Performance may differ though.
与尝试使用 GROUP BY 相比,它更具可读性,也更容易让人费解。性能可能有所不同。
回答by Santhosh
1.Get supplier numbers and names for suppliers of parts supplied to at least two different projects.
1.获取供应给至少两个不同项目的零件的供应商编号和名称。
SELECT S.SID, S.NAME
FROM SUPPLIES SP
JOIN SUPPLIER S
ON SP.SID = S.SID
WHERE PID IN
(SELECT PID FROM SUPPPLIES GROUP BY PID, JID HAVING COUNT(*) >= 2)
I am not slear about your second question
我不太清楚你的第二个问题