SQL Count(*) 和 Group By - 查找行之间的差异

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1168041/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 02:56:02  来源:igfitidea点击:

SQL Count(*) and Group By - Find Difference Between Rows

sql

提问by Remus Rusanu

Below is a SQL query I wrote to find the total number of rows by each Product ID (proc_id):

下面是我编写的 SQL 查询,用于按每个产品 ID (proc_id) 查找总行数:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
ORDER BY proc_id;

Below is the result of the SQL query above:

下面是上面 SQL 查询的结果:

proc_id count(*)
01  626
02  624
03  626
04  624
05  622
06  624
07  624
09  624

Notice the total counts by proc_id = '01', proc_id = '03', and proc_id = '05' are different (not equal to 624 rows as the other proc_id).

请注意 proc_id = '01'、proc_id = '03' 和 proc_id = '05' 的总计数是不同的(不等于其他 proc_id 的 624 行)。

How do I write a SQL query to find which proc_id rows are different for proc_id = '01', proc_id = '03', and proc_id = '05' as compared to the other proc_id?

与其他 proc_id 相比,如何编写 SQL 查询以查找 proc_id = '01'、proc_id = '03' 和 proc_id = '05' 的哪些 proc_id 行不同?

回答by Remus Rusanu

First you need to define the criteria that makes '624' correct. Is it the average count(*)? Is it the count(*)that occurs most often? Is it your favorite count(*)?

首先,您需要定义使“624”正确的标准。是平均值count(*)吗?是count(*)最常发生的吗?是你的最爱count(*)吗?

Then you can use the HAVING clause to separate the ones that don't match your criteria:

然后你可以使用 HAVING 子句来分隔那些不符合你的条件的:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> 624
ORDER BY proc_id;

or:

或者:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> (
  <insert here a subquery that produces the magic '624'>
 )
ORDER BY proc_id;

回答by David M

If you know 624 is the magic number:

如果你知道 624 是一个神奇的数字:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> 624
ORDER BY proc_id;

回答by northpole

try this:

尝试这个:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> (select count(*) from proc z where proc_id in (1) group by proc_id)
ORDER BY proc_id;

回答by Charles Bretana

You can't do this. For some procIds there are fewer rows with that ProcId. In other words, the rows that make that procId not have a count = 624 are rows that DO NOT EXIST. How can any query show those rows?

你不能这样做。对于某些 procId,具有该 ProcId 的行较少。换句话说,使 procId 没有计数 = 624 的行是不存在的行。任何查询如何显示这些行?

For the ProcIds that have too many rows, IF ( and this is big if), IF all the rows in the 624 for other procIds have some attribute that is in common with a 624 count subset of the sets that are too large, then you might be able to identify the "extra" rows, buit there is no way to identify missing rows, all you can do is identify which procIds have too many rows or too few...

对于有太多行的 ProcIds,如果(如果这很大),如果其他 procIds 的 624 中的所有行都有一些属性,这些属性与太大的集合的 624 计数子集相同,那么你也许能够识别“额外”的行,但无法识别丢失的行,您所能做的就是识别哪些 procIds 的行过多或过少......

回答by Mark Brackett

If I understand your question correctly (which is differently than the other posted answers) you want the rowsthat make proc_id 01 different? If that's the case, you need to join on all the columns that should be the same, and look for the differences. So, to compare 01 with 02:

如果我正确理解了您的问题(与其他发布的答案不同),您希望使 proc_id 01 不同的?如果是这种情况,您需要加入所有应该相同的列,并寻找差异。因此,将 01 与 02 进行比较:

 SELECT [01].*
 FROM (
    SELECT * FROM proc
    WHERE grouping_primary = 'SLB'
    AND eff_date = '01-JUL-09'
    AND proc_id = '01'
 ) as [01]
 FULL JOIN (
    SELECT * FROM proc
    WHERE grouping_primary = 'SLB'
    AND eff_date = '01-JUL-09'
    AND proc_id = '02'
 ) as [02] ON
    [01].col1 = [02].col1
    AND [01].col2 = [02].col2
    AND [01].col3 = [02].col3
    /* etc...just don't include proc_id */
 WHERE
    [01].proc_id IS NULL --no match in [02]
    OR [02].proc_id IS NULL --no match in [01]

I'm pretty sure MS Sql Server has a row hash function that may make it easier if you have a bunch of columns...but I can't think of the name of it.

我很确定 MS Sql Server 有一个行哈希函数,如果你有一堆列,它可能会更容易......但我想不出它的名字。

回答by Mark Brackett

Well, in order to find the extra you would use the NOT IN phrase. To find the missing rows you would need to reverse the logic. This naturally assumes that all 624 rows are the same from proc_id to proc_id.

好吧,为了找到额外的内容,您将使用 NOT IN 短语。要找到丢失的行,您需要反转逻辑。这自然假设所有 624 行从 proc_id 到 proc_id 都是相同的。

SELECT proc_id, varying_column 
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
AND   varying_column NOT IN (SELECT b.varying_column 
                             FROM proc b
                             WHERE b.grouping_primary = 'SLB'
                             AND   b.eff_date = '01-JUL-09'
                             AND   b.proc_id = (SELECT FIRST a.proc_id
                                                FROM proc a
                                                WHERE a.grouping_primary = 'SLB'
                                                AND   a.eff_date = '01-JUL-09'
                                                AND   COUNT(a.*) = 624
                                                GROUP BY a.proc_id
                                                ORDER BY a.proc_id;))
ORDER BY proc_id, varying_column;