SQL 使用 group by 和 have 子句

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16327954/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 15:04:12  来源:igfitidea点击:

Using group by and having clause

sql

提问by user2341124

Using the following schema:

使用以下架构:

Supplier (sid, name, status, city)
Part (pid, name, color, weight, city)
Project (jid, name, city)
Supplies (sid, pid, jid**, quantity)
  1. Get supplier numbers and names for suppliers of parts supplied to at least two different projects.

  2. Get supplier numbers and names for suppliers of the same part to at least two different projects.

  1. 获取供应给至少两个不同项目的零件的供应商编号和名称。

  2. 获取至少两个不同项目的同一部件的供应商的供应商编号和名称。

These were my answers:

这些是我的回答:

1.

1.

SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2 

2.

2.

SELECT s.sid, s.name
FROM Suppliers s, Supplies su, Project pr, Part p
WHERE s.sid = su.sid AND su.pid = p.pid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid)>=2

Can anyone confirm if I wrote this correctly? I'm a little confused as to how the Group By and Having clause works

谁能确认我写的是否正确?我对 Group By 和 Have 子句的工作方式有些困惑

回答by dmg

The semantics of Having

拥有的语义

To better understand having, you need to see it from a theoretical point of view.

为了更好地理解拥有,你需要从理论的角度来看待它。

A group byis a query that takes a table and summarizes it into another table. You summarize the original table by grouping the original table into subsets (based upon the attributes that you specify in the group by). Each of these groups will yield one tuple.

按组是一个查询,需要一个表,并将其汇总成另一个表。您可以通过将原始表分组为子集来汇总原始表(基于您在分组依据中指定的属性)。这些组中的每一个都将产生一个元组。

The Havingis simply equivalent to a WHERE clause afterthe group by has executed and before the selectpart of the query is computed.

是简单地等同于WHERE子句的组通过已执行,并且之前选择查询的一部分进行计算。

Lets say your query is:

假设您的查询是:

select a, b, count(*) 
from Table 
where c > 100 
group by a, b 
having count(*) > 10;

The evaluation of this query can be seen as the following steps:

这个查询的评估可以看作是以下步骤:

  1. Perform the WHERE, eliminating rows that do not satisfy it.
  2. Group the table into subsets based upon the values of a and b (each tuple in each subset has the same values of a and b).
  3. Eliminate subsets that do not satisfy the HAVING condition
  4. Process each subset outputting the values as indicated in the SELECT part of the query. This creates one output tuple per subset left after step 3.
  1. 执行 WHERE,消除不满足它的行。
  2. 根据 a 和 b 的值将表分组为子集(每个子集中的每个元组具有相同的 a 和 b 值)。
  3. 消除不满足 HAVING 条件的子集
  4. 处理每个子集,输出在查询的 SELECT 部分中指示的值。这将在步骤 3 之后为每个子集创建一个输出元组。

You can extend this to any complex query there Table can be any complex query that return a table (a cross product, a join, a UNION, etc).

您可以将其扩展到任何复杂的查询,表可以是任何返回表的复杂查询(叉积、联接、联合等)。

In fact, havingis syntactic sugarand does not extend the power of SQL. Any given query:

事实上,拥有只是语法糖,并没有扩展 SQL 的功能。任何给定的查询:

SELECT list 
FROM table
GROUP BY attrList
HAVING condition;

can be rewritten as:

可以改写为:

SELECT list from (
   SELECT listatt 
   FROM table 
   GROUP BY attrList) as Name
WHERE condition;

The listatt is a list that includes the GROUP BY attributes and the expressions used in list and condition. It might be necessary to name some expressions in this list (with AS). For instance, the example query above can be rewritten as:

listatt 是一个列表,其中包括 GROUP BY 属性以及列表和条件中使用的表达式。可能需要在此列表中命名一些表达式(使用 AS)。例如,上面的示例查询可以改写为:

select a, b, count 
from (select a, b, count(*) as count
      from Table 
      where c > 100
      group by a, b) as someName
where count > 10;

The solution you need

您需要的解决方案

Your solution seems to be correct:

您的解决方案似乎是正确的:

SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2 

You join the three tables, then using sid as a grouping attribute (sname is functionally dependent on it, so it does not have an impact on the number of groups, but you mustinclude it, otherwise it cannot be part of the select part of the statement). Then you are removing those that do not satisfy your condition: the satisfy pr.jid is >= 2, which is that you wanted originally.

你join这三个表,然后使用sid作为分组属性(sname在功能上依赖于它,所以它对组数没有影响,但是你必须包含它,否则它不能成为select部分的一部分该声明)。然后您将删除那些不满足您的条件的:满足pr.jid is >= 2,这是您最初想要的。

Best solution to your problem

您的问题的最佳解决方案

I personally prefer a simpler cleaner solution:

我个人更喜欢更简单的清洁解决方案:

  1. You need to only group by Supplies (sid, pid, jid**, quantity) to find the sid of those that supply at least to two projects.
  2. Then join it to the Suppliers table to get the supplier same.
  1. 您只需按 Supplies (sid、pid、jid**、quantity) 分组即可找到至少供应两个项目的那些的 sid。
  2. 然后将其加入到供应商表中以获取相同的供应商。

 SELECT sid, sname from
    (SELECT sid from supplies 
    GROUP BY sid, pid 
    HAVING count(DISTINCT jid) >= 2
    ) AS T1
NATURAL JOIN 
Supliers;

It will also be faster to execute, because the join is only done when needed, not all the times.

执行速度也会更快,因为连接仅在需要时完成,而不是所有时间。

--dmg

--dmg

回答by Harry

Because we can not use Where clause with aggregate functions like count(),min(), sum()etc. so having clause came into existence to overcome this problem in sql. see example for having clause go through this link

因为我们不能使用WHERE子句与聚合函数数(),MIN(),和()等,所以having子句应运而生,以克服在SQL这个问题。请参阅让子句通过此链接的示例

http://www.sqlfundamental.com/having-clause.php

http://www.sqlfundamental.com/have-clause.php

回答by plalx

First of all, you should use the JOINsyntax rather than FROM table1, table2, and you should always limit the grouping to as little fields as you need.

首先,您应该使用JOIN语法而不是FROM table1, table2,并且您应该始终将分组限制为您需要的尽可能少的字段。

Altought I haven't tested, your first query seems fine to me, but could be re-written as:

我还没有测试过,你的第一个查询对我来说似乎很好,但可以重写为:

SELECT s.sid, s.name
FROM 
    Supplier s
    INNER JOIN (
       SELECT su.sid
       FROM Supplies su
       GROUP BY su.sid
       HAVING COUNT(DISTINCT su.jid) > 1
    ) g
        ON g.sid = s.sid

Or simplified as:

或简化为:

SELECT sid, name
FROM Supplier s
WHERE (
    SELECT COUNT(DISTINCT su.jid)
    FROM Supplies su
    WHERE su.sid = s.sid
) > 1

However, your second query seems wrong to me, because you should also GROUP BY pid.

但是,您的第二个查询对我来说似乎是错误的,因为您还应该GROUP BY pid.

 SELECT s.sid, s.name
    FROM 
        Supplier s
        INNER JOIN (
            SELECT su.sid
            FROM Supplies su
            GROUP BY su.sid, su.pid
            HAVING COUNT(DISTINCT su.jid) > 1
        ) g
            ON g.sid = s.sid

As you may have noticed in the query above, I used the INNER JOINsyntax to perform the filtering, however it can be also written as:

正如您在上面的查询中可能已经注意到的那样,我使用了INNER JOIN语法来执行过滤,但它也可以写为:

SELECT s.sid, s.name
FROM Supplier s
WHERE (
     SELECT COUNT(DISTINCT su.jid)
     FROM Supplies su
     WHERE su.sid = s.sid
     GROUP BY su.sid, su.pid
) > 1

回答by cbp

What type of sql database are using (MSSQL, Oracle etc)? I believe what you have written is correct.

使用什么类型的 sql 数据库(MSSQL、Oracle 等)?我相信你所写的是正确的。

You could also write the first query like this:

您也可以像这样编写第一个查询:

SELECT s.sid, s.name
FROM Supplier s
WHERE (SELECT COUNT(DISTINCT pr.jid)
       FROM Supplies su, Projects pr
       WHERE su.sid = s.sid 
           AND pr.jid = su.jid) >= 2

It's a little more readable, and less mind-bending than trying to do it with GROUP BY. Performance may differ though.

与尝试使用 GROUP BY 相比,它更具可读性,也更容易让人费解。性能可能有所不同。

回答by Santhosh

1.Get supplier numbers and names for suppliers of parts supplied to at least two different projects.

1.获取供应给至少两个不同项目的零件的供应商编号和名称。

 SELECT S.SID, S.NAME
 FROM SUPPLIES SP
 JOIN SUPPLIER S
 ON SP.SID = S.SID
 WHERE PID IN
 (SELECT PID FROM SUPPPLIES GROUP BY PID, JID HAVING COUNT(*) >= 2)

I am not slear about your second question

我不太清楚你的第二个问题