t-sql GROUP BY 和 COUNT,然后从 COUNT 中包含 MAX
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9217486/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
t-sql GROUP BY with COUNT, and then include MAX from the COUNT
提问by mg1075
Suppose you had a table of "Cars" with hundreds of thousands of rows, and you wanted to do a GROUP BY:
假设您有一个包含数十万行的“汽车”表,并且您想做一个 GROUP BY:
SELECT CarID
, CarName
, COUNT(*) AS Total
FROM dbo.tbl_Cars
GROUP BY CarID
, CarName
The grouping leaves you with a result akin to:
分组结果类似于:
CarID CarName Total
1872 Olds 202,121
547841 BMW 175,298
9877 Ford 10,241
All fine and well. My question, though, is what is the best way to get the Total and the MAX Total into one table, in terms of performance and clean coding, so you have a result like:
一切都很好。不过,我的问题是,在性能和干净编码方面,将 Total 和 MAX Total 放在一张表中的最佳方法是什么,因此您会得到如下结果:
CarID CarName Total Max Total
1872 Olds 202,121 202,121
547841 BMW 175,298 202,121
9877 Ford 10,241 202,121
One approach would be to put the GROUP result into a temp table, and then get the MAX from the temp table into a local variable. But I'm wondering what the best way to do this would be.
一种方法是将 GROUP 结果放入临时表中,然后将临时表中的 MAX 放入局部变量中。但我想知道这样做的最佳方法是什么。
UPDATE
更新
The Common Table Expression seems the most elegant to write, yet similar to @EBarr, my limited testing indicates a significantly slower performance. So I won't be going with the CTE.
Common Table Expression 看起来写起来最优雅,但与 @EBarr 相似,我有限的测试表明性能显着降低。所以我不会和 CTE 一起去。
As the link @EBarr has for the COMPUTE
option indicates the feature
is deprecated, that doesn't seem the best route, either.
由于@EBarrCOMPUTE
选项的链接表明该功能已被弃用,这似乎也不是最佳途径。
The option of a local variable for the MAX value and the use of a temp table will likely be the route I go down, as I'm not aware of performance issues with it.
MAX 值的局部变量选项和临时表的使用可能是我走的路线,因为我不知道它的性能问题。
A bit more detail about my use case: it could probably end up being a series of other SO questions. But suffice to say that I'm loading a large subset of data into a temp table (so a subset of tbl_Cars is going into #tbl_Cars, and even #tbl_Cars may be further filtered and have aggregations performed on it), because I have to perform multiple filtering and aggregation queries on it within a single stored proc that returns multiple result sets.
关于我的用例的更多细节:它可能最终成为一系列其他 SO 问题。但足以说明我正在将一个大数据子集加载到临时表中(因此 tbl_Cars 的一个子集将进入 #tbl_Cars,甚至 #tbl_Cars 也可能被进一步过滤并对其执行聚合),因为我必须在返回多个结果集的单个存储过程中对其执行多个过滤和聚合查询。
UPDATE 2
更新 2
@EBarr's use of a windowed function is nice and short. Note to self:
if using a RIGHT JOIN
to an outer reference table, the COUNT()
function should select a column from tbl_Cars, not '*'
.
@EBarr 对窗口函数的使用既好又短。自我注意:如果使用RIGHT JOIN
外部引用表,COUNT()
函数应该从 tbl_Cars 中选择一列,而不是'*'
.
SELECT M.MachineID
, M.MachineType
, COUNT(C.CarID) AS Total
, MAX(COUNT(C.CarID)) OVER() as MaxTotal
FROM dbo.tbl_Cars C
RIGHT JOIN dbo.tbl_Machines M
ON C.CarID = M.CarID
GROUP BY M.MachineID
, M.MachineType
In terms of speed, it seems fine, but at what point do you have to be worried about the number of reads?
从速度上来说,似乎还好,但是什么时候需要担心读取次数呢?
回答by EBarr
Mechanically there are a few ways to do this. You could use temp tables/table variable. Another way is with nested queries and/or a CTE as @Aaron_Bertrand showed. A third way is to use WINDOWED FUNCTIONS such as...
从机械上讲,有几种方法可以做到这一点。您可以使用临时表/表变量。另一种方法是使用嵌套查询和/或 @Aaron_Bertrand 所示的 CTE。第三种方法是使用 WINDOWED FUNCTIONS,例如...
SELECT CarName,
COUNT(*) as theCount,
MAX(Count(*)) OVER(PARTITION BY 'foo') as MaxPerGroup
FROM dbo.tbl_Cars
GROUP BY CarName
A DISFAVORED(read depricated) fourth way is using the COMPUTE keyword as such...
一个不利的(读depricated)第四种方法是使用compute关键字,这样...
SELECT CarID, CarName, Count(*)
FROM dbo.tbl_Cars
GROUP BY CarID, CarName
COMPUTE MAX(Count(*))
The COMPUTE
keyword generates totals that appear as additional summary columns at the end of the result set (see this). In the query above you will actually see two record sets.
该COMPUTE
关键字生成在结果集末尾显示为附加汇总列的总计(请参阅此)。在上面的查询中,您实际上会看到两个记录集。
Fastest
最快的
Now, the next issue is what's the "best/fastest/easiest." I immediately think of an indexed view
. As @Aaron gently reminded me, indexed views have all sorts of restrictions. The above, strategy, however, allows you to create an indexed view on the SELECT...FROM..GROUP BY. Then selecting from the indexed view apply the WINDOWED FUNCTION clause.
现在,下一个问题是什么是“最好/最快/最简单”。我立刻想到了一个indexed view
. 正如@Aaron 轻轻提醒我的那样,索引视图有各种限制。但是,上述策略允许您在 SELECT...FROM...GROUP BY 上创建索引视图。然后从索引视图中选择应用 WINDOWED FUNCTION 子句。
Without knowing more, however, about your design it is going to be difficult for anyone tell you what's best. You will get lighting fast queries from an indexed view. That performance comes at a price, though. The price is maintenance costs. If the underlying table is the target of a large amount of insert/update/delete operations the maintenance of the indexed view will bog down performance in other areas.
但是,如果不了解更多关于您的设计的信息,任何人都很难告诉您什么是最好的。您将从索引视图中获得快速的查询。不过,这种性能是有代价的。价格是维护成本。如果基础表是大量插入/更新/删除操作的目标,则索引视图的维护将降低其他方面的性能。
If you share a bit more about your use case and data access patterns people will be able to share more insight.
如果您分享更多有关您的用例和数据访问模式的信息,人们将能够分享更多见解。
MICRO PERFORMANCE TEST
微观性能测试
So I generated a little data script and looked at sql profiler numbers for the CTE performance vs windowed functions. This is a micro-test, so try some real numbers in your systemunder real load.
因此,我生成了一个小数据脚本,并查看了 CTE 性能与窗口函数的 sql 分析器编号。这是一个微测试,因此请在实际负载下在您的系统中尝试一些实数。
Data generation:
数据生成:
Create table Cars ( CarID int identity (1,1) primary key,
CarName varchar(20),
value int)
GO
insert into Cars (CarName, value)
values ('Buick', 100),
('Ford', 10),
('Buick', 300),
('Buick', 100),
('Pontiac', 300),
('Bmw', 100),
('Mecedes', 300),
('Chevy', 300),
('Buick', 100),
('Ford', 200);
GO 1000
This script generates 10,000 rows. I then ran each of the four following queries multiple times :
此脚本生成 10,000 行。然后我多次运行以下四个查询中的每一个:
--just group by
select CarName,COUNT(*) countThis
FROM Cars
GROUP BY CarName
--group by with compute (BAD BAD DEVELOPER!)
select CarName,COUNT(*) countThis
FROM Cars
GROUP BY CarName
COMPUTE MAX(Count(*));
-- windowed aggregates...
SELECT CarName,
COUNT(*) as theCount,
MAX(Count(*)) OVER(PARTITION BY 'foo') as MaxInAnyGroup
FROM Cars
GROUP BY CarName
--CTE version
;WITH x AS (
SELECT CarName,
COUNT(*) AS Total
FROM Cars
GROUP BY CarName
)
SELECT x.CarName, x.Total, x2.[Max Total]
FROM x CROSS JOIN (
SELECT [Max Total] = MAX(Total) FROM x
) AS x2;
After running the above queries, I created an indexed view on the "just group by" query above. Then I ran a query on the indexed view that performed a MAX(Count(*)) OVER(PARTITION BY 'foo'
.
运行上述查询后,我在上面的“just group by”查询上创建了一个索引视图。然后我在索引视图上运行了一个查询,该视图执行了MAX(Count(*)) OVER(PARTITION BY 'foo'
.
AVERAGE RESULTS
平均结果
Query CPU Reads Duration
--------------------------------------------------------
Group By 15 31 7 ms
Group & Compute 15 31 7 ms
Windowed Functions 14 56 8 ms
Common Table Exp. 16 62 15 ms
Windowed on Indexed View 0 24 0 ms
Obviously this is a micro-benchmark and only mildly instructive, so take it for what it's worth.
显然,这是一个微基准测试,只是具有轻微的指导意义,所以要物有所值。
回答by Aaron Bertrand
Here's one way:
这是一种方法:
;WITH x AS
(
SELECT CarID
, CarName
, COUNT(*) AS Total
FROM dbo.tbl_Cars
GROUP BY CarID, CarName
)
SELECT x.CarID, x.CarName, x.Total, x2.[Max Total]
FROM x CROSS JOIN
(
SELECT [Max Total] = MAX(Total) FROM x
) AS x2;
回答by mboyd
SQL Server 2008 R2 and newer versions, you can use :
SQL Server 2008 R2 及更新版本,您可以使用:
GROUP BY CarID, CarName WITH ROLLUP