SQL 何时使用 GROUPING SETS、CUBE 和 ROLLUP

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25274879/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 02:27:45  来源:igfitidea点击:

When to use GROUPING SETS, CUBE and ROLLUP

sqlsql-servergroupingcuberollup

提问by Yevgraf Andreyevich Zhivago

I have recently learned about GROUPING SETS, CUBE and ROLLUP for defining multiple grouping sets in sql server.

我最近了解了用于在 sql server 中定义多个分组集的 GROUPING SETS、CUBE 和 ROLLUP。

What I am asking is under what circumstances do we use these features ? What are the benefits and advantages of using them?

我要问的是我们在什么情况下使用这些功能?使用它们有什么好处和优势?

SELECT shipperid, YEAR(shippeddate) AS shipyear, COUNT(*) AS numorders
FROM Sales.Orders
GROUP BY GROUPING SETS ( ( shipperid, YEAR(shippeddate) ), ( shipperid ), ( YEAR(shippeddate) ), ( ) );


SELECT shipperid, YEAR(shippeddate) AS shipyear, COUNT(*) AS numorders
FROM Sales.Orders
GROUP BY CUBE( shipperid, YEAR(shippeddate) );


SELECT shipcountry, shipregion, shipcity, COUNT(*) AS numorders
FROM Sales.Orders
GROUP BY ROLLUP( shipcountry, shipregion, shipcity );

采纳答案by SurroundedByFish

Firstly, for those who haven't already read up on the subject:

首先,对于那些还没有读过这个主题的人:

That being said, don't think about these grouping options as ways to get a result set. These are performance tools.

话虽如此,不要将这些分组选项视为获取结果集的方法。这些是性能工具

Let's take ROLLUPas a simple example.

我们ROLLUP举一个简单的例子。

I can use the following query to get the count of records for each value of GrpCol.

我可以使用以下查询来获取 GrpCol 的每个值的记录数。

SELECT   GrpCol, count(*) AS cnt
FROM     dbo.MyTable
GROUP BY GrpCol

And I can use the following query to summarily "roll up" the count of ALL records.

我可以使用以下查询来汇总“汇总”所有记录的计数。

SELECT   NULL, count(*) AS cnt
FROM     dbo.MyTable

And I could UNION ALLthe above two queries to get the exact same results I might get if I had written the first query with the ROLLUPclause (that's why I put the NULL in there).

UNION ALL如果我用ROLLUP子句编写第一个查询,我可以通过上述两个查询获得完全相同的结果(这就是我将 NULL 放在那里的原因)。

It might actually be more convenient for me to execute this as two different queries because then I have the grouped results separate from my totals. Why would I want my final total mixed right in to the rest of those results? The answer is that doing both together using the ROLLUPclause is more efficient. SQL Server will use an execution plan that calculates all of the aggregations together in one pass. Compare that to the UNION ALLexample which would provide the exact same results but use a less efficient execution plan (two table scans instead of one).

对于我来说,将它作为两个不同的查询来执行实际上可能更方便,因为这样我将分组结果与我的总数分开。为什么我希望我的最终总数与其余的结果混合在一起?答案是使用该ROLLUP子句将两者结合起来更有效。SQL Server 将使用一个执行计划,一次计算所有聚合。将其与UNION ALL提供完全相同结果但使用效率较低的执行计划(两次表扫描而不是一次表扫描)的示例进行比较。

Imagine an extreme example in which you are working on a data set so large that each scan of the data takes one whole hour. You have to provide totals on basically every possible dimension (way to slice) that data every day. Aha! I bet one of these grouping options is exactly what you need. If you save off the results of that one scan into a special schema layout, you will then be able to run reports for the rest of the day off the saved results.

想象一个极端的例子,其中您正在处理一个如此大的数据集,以至于每次扫描数据都需要整整一个小时。您必须每天提供基本上每个可能维度(切片方式)该数据的总计。啊哈!我敢打赌,这些分组选项之一正是您所需要的。如果您将一次扫描的结果保存到一个特殊的模式布局中,那么您将能够在保存的结果之外运行当天剩余时间的报告。

So I'm basically saying that you're working on a data warehouse project. For the rest of us it mostly falls into the "neat thing to know" category.

所以我基本上是说您正在从事数据仓库项目。对于我们其他人来说,它主要属于“要知道的事情”类别。

回答by Nizam

The CUBEis the same of GROUPING SETSwith all possible combinations.

CUBE相同的GROUPING SETS所有可能的组合。

So this (using CUBE)

所以这(使用CUBE

GROUP BY CUBE (C1, C2, C3, ..., Cn-2, Cn-1, Cn)

is the same of this (using GROUPING SETS)

与此相同(使用GROUPING SETS

GROUP BY GROUPING SETS (
     (C1, C2, C3, ..., Cn-2, Cn-1, Cn) -- All dimensions are included.
    ,( , C2, C3, ..., Cn-2, Cn-1, Cn) -- n-1 dimensions are included.
    ,(C1, C3, ..., Cn-2, Cn-1, Cn)
    …
    ,(C1, C2, C3, ..., Cn-2, Cn-1,)
    ,(C3, ..., Cn-2, Cn-1, Cn) -- n-2 dimensions included
    ,(C1  ..., Cn-2, Cn-1, Cn)
    …
    ,(C1, C2) -- 2 dimensions are included.
    ,…
    ,(C1, Cn)
    ,…
    ,(Cn-1, Cn)
    ,…
    ,(C1) -- 1 dimension included
    ,(C2)
    ,…
    ,(Cn-1)
    ,(Cn)
    ,() ) -- Grand total, 0 dimension is included.

Then, if you don't really need all combinations, you should use GROUPING SETSrather than CUBE

然后,如果你真的不需要所有的组合,你应该使用GROUPING SETS而不是CUBE

ROLLUP and CUBE operators generate some of the same result sets and perform some of the same calculations as OLAP applications. The CUBE operator generates a result set that can be used for cross tabulation reports. A ROLLUP operation can calculate the equivalent of an OLAP dimension or hierarchy.

ROLLUP 和 CUBE 运算符生成一些相同的结果集并执行一些与 OLAP 应用程序相同的计算。CUBE 运算符生成可用于交叉制表报告的结果集。ROLLUP 操作可以计算 OLAP 维度或层次结构的等效项。

Look here to see Grouping Sets Equivalents

看这里看分组集等价物



UPDATE

更新

I think an example would help here. Suppose you have a table of number of UFOs sightings by country and gender, like bellow:

我认为一个例子会有所帮助。假设您有一个按国家和性别划分的 UFO 目击数量表,如下所示:

╔═════════╦═══════╦═════════╗
║ COUNTRY ║ GENDER║ #SIGHTS ║
╠═════════╬═══════╬═════════╣
║ USA     ║ F     ║     450 ║
║ USA     ║ M     ║    1500 ║
║ ITALY   ║ F     ║     704 ║
║ ITALY   ║ M     ║     720 ║
║ SWEDEN  ║ F     ║     317 ║
║ SWEDEN  ║ M     ║     310 ║
║ BRAZIL  ║ F     ║     144 ║
║ BRAZIL  ║ M     ║     159 ║
╚═════════╩═══════╩═════════╝

Then, if you want to know the totals for each country, by gender and grand total only, you should use GROUPING SETS

然后,如果您想知道每个国家的总数,仅按性别和总计,您应该使用 GROUPING SETS

 select Country, Gender, sum(Number_Of_Sights)
 from Table1
 group by GROUPING SETS((Country), (Gender), ())
 order by Country, Gender

SQL Fiddle

SQL小提琴

To get the same result with GROUP BY, you would use UNION ALLas:

要获得与 相同的结果GROUP BY,您可以使用UNION ALL

select Country, NULL Gender, sum(Number_Of_Sights)
from Table1
GROUP BY Country
UNION ALL
select NULL Country, Gender, sum(Number_Of_Sights)
from Table1
GROUP BY GENDER
UNION ALL
SELECT NULL Country, NULL Gender, sum(Number_Of_Sights)
FROM TABLE1
ORDER BY COUNTRY, GENDER

SQL Fiddle

SQL小提琴

But it is not possible to obtain the same result with CUBE, since it will return all possibilities.

但是用 CUBE 不可能得到相同的结果,因为它会返回所有的可能性。

Now, if you want to know all possible combinations, then you should use CUBE

现在,如果你想知道所有可能的组合,那么你应该使用 CUBE

回答by Rob Farley

I find they are good when you're producing a report and the result is not something which can be rolled up within the client.

我发现它们在您生成报告时很好,并且结果不是可以在客户端内汇总的东西。

For example, if you're doing something with COUNT(DISTINCT...)then the result across a larger group is not necessarily the same value as the sum of the parts. Eg, across two individual days you might have 1500 visitors and 2000 visitors, but the total could be anywhere between 2000 and 3500, depending on the overlap. It's nice to do this in the client, but because the client can't tell what the overlap is, you can use GROUPING SETSto provide the answer (and then handle that extra row coming through in the report).

例如,如果您正在做某事,COUNT(DISTINCT...)那么更大组的结果不一定与各部分的总和相同。例如,在单独的两天内,您可能有 1500 位访客和 2000 位访客,但总数可能介于 2000 和 3500 之间,具体取决于重叠情况。在客户端执行此操作很好,但是因为客户端无法分辨重叠是什么,您可以使用GROUPING SETS来提供答案(然后处理报告中出现的额外行)。