t-sql GROUP BY 和 COUNT，然后从 COUNT 中包含 MAX

Question

提问by mg1075

Suppose you had a table of "Cars" with hundreds of thousands of rows, and you wanted to do a GROUP BY:

假设您有一个包含数十万行的“汽车”表，并且您想做一个 GROUP BY：

SELECT   CarID
         , CarName
         , COUNT(*) AS Total
FROM     dbo.tbl_Cars
GROUP BY CarID
         , CarName

The grouping leaves you with a result akin to:

分组结果类似于：

CarID       CarName    Total
1872        Olds       202,121   
547841      BMW        175,298
9877        Ford        10,241

All fine and well. My question, though, is what is the best way to get the Total and the MAX Total into one table, in terms of performance and clean coding, so you have a result like:

一切都很好。不过，我的问题是，在性能和干净编码方面，将 Total 和 MAX Total 放在一张表中的最佳方法是什么，因此您会得到如下结果：

CarID       CarName    Total      Max Total
1872        Olds       202,121    202,121
547841      BMW        175,298    202,121
9877        Ford        10,241    202,121

One approach would be to put the GROUP result into a temp table, and then get the MAX from the temp table into a local variable. But I'm wondering what the best way to do this would be.

一种方法是将 GROUP 结果放入临时表中，然后将临时表中的 MAX 放入局部变量中。但我想知道这样做的最佳方法是什么。

UPDATE

更新

The Common Table Expression seems the most elegant to write, yet similar to @EBarr, my limited testing indicates a significantly slower performance. So I won't be going with the CTE.

Common Table Expression 看起来写起来最优雅，但与 @EBarr 相似，我有限的测试表明性能显着降低。所以我不会和 CTE 一起去。

As the link @EBarr has for the COMPUTEoption indicates the feature is deprecated, that doesn't seem the best route, either.

由于@EBarrCOMPUTE选项的链接表明该功能已被弃用，这似乎也不是最佳途径。

The option of a local variable for the MAX value and the use of a temp table will likely be the route I go down, as I'm not aware of performance issues with it.

MAX 值的局部变量选项和临时表的使用可能是我走的路线，因为我不知道它的性能问题。

A bit more detail about my use case: it could probably end up being a series of other SO questions. But suffice to say that I'm loading a large subset of data into a temp table (so a subset of tbl_Cars is going into #tbl_Cars, and even #tbl_Cars may be further filtered and have aggregations performed on it), because I have to perform multiple filtering and aggregation queries on it within a single stored proc that returns multiple result sets.

关于我的用例的更多细节：它可能最终成为一系列其他 SO 问题。但足以说明我正在将一个大数据子集加载到临时表中（因此 tbl_Cars 的一个子集将进入 #tbl_Cars，甚至 #tbl_Cars 也可能被进一步过滤并对其执行聚合），因为我必须在返回多个结果集的单个存储过程中对其执行多个过滤和聚合查询。

UPDATE 2

更新 2

@EBarr's use of a windowed function is nice and short. Note to self: if using a RIGHT JOINto an outer reference table, the COUNT()function should select a column from tbl_Cars, not '*'.

@EBarr 对窗口函数的使用既好又短。自我注意：如果使用RIGHT JOIN外部引用表，COUNT()函数应该从 tbl_Cars 中选择一列，而不是'*'.

SELECT       M.MachineID
             , M.MachineType
             , COUNT(C.CarID) AS Total
             , MAX(COUNT(C.CarID)) OVER() as MaxTotal
FROM         dbo.tbl_Cars C
RIGHT JOIN   dbo.tbl_Machines M
      ON     C.CarID = M.CarID
GROUP BY     M.MachineID
             , M.MachineType

In terms of speed, it seems fine, but at what point do you have to be worried about the number of reads?

从速度上来说，似乎还好，但是什么时候需要担心读取次数呢？

Answer 1

回答by EBarr

Mechanically there are a few ways to do this. You could use temp tables/table variable. Another way is with nested queries and/or a CTE as @Aaron_Bertrand showed. A third way is to use WINDOWED FUNCTIONS such as...

从机械上讲，有几种方法可以做到这一点。您可以使用临时表/表变量。另一种方法是使用嵌套查询和/或 @Aaron_Bertrand 所示的 CTE。第三种方法是使用 WINDOWED FUNCTIONS，例如...

SELECT    CarName,
          COUNT(*) as theCount,
          MAX(Count(*)) OVER(PARTITION BY 'foo') as MaxPerGroup
FROM      dbo.tbl_Cars
GROUP BY CarName

A DISFAVORED(read depricated) fourth way is using the COMPUTE keyword as such...

一个不利的（读depricated）第四种方法是使用compute关键字，这样...

SELECT   CarID, CarName, Count(*)
FROM     dbo.tbl_Cars
GROUP BY CarID, CarName 
COMPUTE MAX(Count(*))

The COMPUTEkeyword generates totals that appear as additional summary columns at the end of the result set (see this). In the query above you will actually see two record sets.

该COMPUTE关键字生成在结果集末尾显示为附加汇总列的总计（请参阅此）。在上面的查询中，您实际上会看到两个记录集。

Fastest

最快的

Now, the next issue is what's the "best/fastest/easiest." I immediately think of an indexed view. As @Aaron gently reminded me, indexed views have all sorts of restrictions. The above, strategy, however, allows you to create an indexed view on the SELECT...FROM..GROUP BY. Then selecting from the indexed view apply the WINDOWED FUNCTION clause.

现在，下一个问题是什么是“最好/最快/最简单”。我立刻想到了一个indexed view. 正如@Aaron 轻轻提醒我的那样，索引视图有各种限制。但是，上述策略允许您在 SELECT...FROM...GROUP BY 上创建索引视图。然后从索引视图中选择应用 WINDOWED FUNCTION 子句。

Without knowing more, however, about your design it is going to be difficult for anyone tell you what's best. You will get lighting fast queries from an indexed view. That performance comes at a price, though. The price is maintenance costs. If the underlying table is the target of a large amount of insert/update/delete operations the maintenance of the indexed view will bog down performance in other areas.

但是，如果不了解更多关于您的设计的信息，任何人都很难告诉您什么是最好的。您将从索引视图中获得快速的查询。不过，这种性能是有代价的。价格是维护成本。如果基础表是大量插入/更新/删除操作的目标，则索引视图的维护将降低其他方面的性能。

If you share a bit more about your use case and data access patterns people will be able to share more insight.

如果您分享更多有关您的用例和数据访问模式的信息，人们将能够分享更多见解。

MICRO PERFORMANCE TEST

微观性能测试

So I generated a little data script and looked at sql profiler numbers for the CTE performance vs windowed functions. This is a micro-test, so try some real numbers in your systemunder real load.

因此，我生成了一个小数据脚本，并查看了 CTE 性能与窗口函数的 sql 分析器编号。这是一个微测试，因此请在实际负载下在您的系统中尝试一些实数。

Data generation:

数据生成：

Create table Cars ( CarID int identity (1,1) primary key, 
                    CarName varchar(20), 
                    value int)
GO
insert into Cars (CarName, value)
values  ('Buick', 100),
        ('Ford', 10),
        ('Buick', 300),     
        ('Buick', 100),
        ('Pontiac', 300),       
        ('Bmw', 100),
        ('Mecedes', 300),       
        ('Chevy', 300),     
        ('Buick', 100),
        ('Ford', 200);
GO 1000

This script generates 10,000 rows. I then ran each of the four following queries multiple times :

此脚本生成 10,000 行。然后我多次运行以下四个查询中的每一个：

--just group by
select  CarName,COUNT(*) countThis
FROM    Cars
GROUP BY CarName        

--group by with compute (BAD BAD DEVELOPER!)
select  CarName,COUNT(*) countThis
FROM    Cars
GROUP BY CarName        
COMPUTE  MAX(Count(*));

-- windowed aggregates...
SELECT  CarName,
        COUNT(*) as theCount,
        MAX(Count(*)) OVER(PARTITION BY 'foo') as MaxInAnyGroup
FROM Cars
GROUP BY CarName        

--CTE version
;WITH x AS (
  SELECT   CarName,
           COUNT(*) AS Total
  FROM     Cars
  GROUP BY CarName
)
SELECT x.CarName, x.Total, x2.[Max Total]
FROM x CROSS JOIN (
  SELECT [Max Total] = MAX(Total) FROM x
) AS x2;

After running the above queries, I created an indexed view on the "just group by" query above. Then I ran a query on the indexed view that performed a MAX(Count(*)) OVER(PARTITION BY 'foo'.

运行上述查询后，我在上面的“just group by”查询上创建了一个索引视图。然后我在索引视图上运行了一个查询，该视图执行了MAX(Count(*)) OVER(PARTITION BY 'foo'.

AVERAGE RESULTS

平均结果

Query                      CPU       Reads     Duration   
--------------------------------------------------------
Group By                   15        31        7 ms  
Group & Compute            15        31        7 ms
Windowed Functions         14        56        8 ms 
Common Table Exp.          16        62       15 ms
Windowed on Indexed View    0        24        0 ms

Obviously this is a micro-benchmark and only mildly instructive, so take it for what it's worth.

显然，这是一个微基准测试，只是具有轻微的指导意义，所以要物有所值。

Answer 2

回答by Aaron Bertrand

Here's one way:

这是一种方法：

;WITH x AS
(
  SELECT   CarID
         , CarName
         , COUNT(*) AS Total
  FROM     dbo.tbl_Cars
  GROUP BY CarID, CarName
)
SELECT x.CarID, x.CarName, x.Total, x2.[Max Total]
FROM x CROSS JOIN
(
  SELECT [Max Total] = MAX(Total) FROM x
) AS x2;

Answer 3

回答by mboyd

SQL Server 2008 R2 and newer versions, you can use :

SQL Server 2008 R2 及更新版本，您可以使用：

GROUP BY CarID, CarName WITH ROLLUP

t-sql GROUP BY 和 COUNT，然后从 COUNT 中包含 MAX

提问by mg1075

回答by EBarr

回答by Aaron Bertrand

回答by mboyd

相关推荐

最近更新

标签

t-sql GROUP BY 和 COUNT，然后从 COUNT 中包含 MAX

提问by mg1075

回答by EBarr

回答by Aaron Bertrand

回答by mboyd

相关推荐

将单个 sql server 行存储在变量中，然后使用列值构建查询

如何查找可用于 SQL Server 的 OLE 提供程序？

如何从变量运行超过 8000 个字符的 SQL 语句？

当数据包含逗号时将 sql 假脱机为 CSV

相关推荐

最近更新

标签