在 SQL Server 中计算运行总计

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/860966/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 02:03:15  来源:igfitidea点击:

Calculate a Running Total in SQL Server

sqlsql-servertsqlrunning-total

提问by codeulike

Imagine the following table (called TestTable):

想象一下下表(称为TestTable):

id     somedate    somevalue
--     --------    ---------
45     01/Jan/09   3
23     08/Jan/09   5
12     02/Feb/09   0
77     14/Feb/09   7
39     20/Feb/09   34
33     02/Mar/09   6

I would like a query that returns a running total in date order, like:

我想要一个按日期顺序返回运行总计的查询,例如:

id     somedate    somevalue  runningtotal
--     --------    ---------  ------------
45     01/Jan/09   3          3
23     08/Jan/09   5          8
12     02/Feb/09   0          8
77     14/Feb/09   7          15  
39     20/Feb/09   34         49
33     02/Mar/09   6          55

I know there are various ways of doing thisin SQL Server 2000 / 2005 / 2008.

我知道在 SQL Server 2000 / 2005 / 2008 中有多种方法可以做到这一点

I am particularly interested in this sort of method that uses the aggregating-set-statement trick:

我对这种使用聚合集语句技巧的方法特别感兴趣:

INSERT INTO @AnotherTbl(id, somedate, somevalue, runningtotal) 
   SELECT id, somedate, somevalue, null
   FROM TestTable
   ORDER BY somedate

DECLARE @RunningTotal int
SET @RunningTotal = 0

UPDATE @AnotherTbl
SET @RunningTotal = runningtotal = @RunningTotal + somevalue
FROM @AnotherTbl

... this is very efficient but I have heard there are issues around this because you can't necessarily guarantee that the UPDATEstatement will process the rows in the correct order. Maybe we can get some definitive answers about that issue.

...这是非常有效的,但我听说这方面存在问题,因为您不一定能保证该UPDATE语句会以正确的顺序处理行。也许我们可以得到一些关于这个问题的明确答案。

But maybe there are other ways that people can suggest?

但也许人们可以提出其他建议?

edit: Now with a SqlFiddlewith the setup and the 'update trick' example above

编辑:现在使用带有设置的SqlFiddle和上面的“更新技巧”示例

采纳答案by Sam Saffron

Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947

更新,如果您运行的是 SQL Server 2012,请参阅:https: //stackoverflow.com/a/10309947

The problem is that the SQL Server implementation of the Over clause is somewhat limited.

问题是 Over 子句的 SQL Server 实现有些受限

Oracle (and ANSI-SQL) allow you to do things like:

Oracle(和 ANSI-SQL)允许您执行以下操作:

 SELECT somedate, somevalue,
  SUM(somevalue) OVER(ORDER BY somedate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM Table

SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.

SQL Server 没有为您提供此问题的干净解决方案。我的直觉告诉我,这是游标速度最快的罕见情况之一,尽管我必须对大结果进行一些基准测试。

The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will probablybe safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):

更新技巧很方便,但我觉得它相当脆弱。似乎如果您正在更新一个完整的表,那么它将按主键的顺序进行。因此,如果您将日期设置为主键升序,您将probably是安全的。但是您依赖于未记录的 SQL Server 实现细节(如果查询最终由两个过程执行,我想知道会发生什么,请参阅:MAXDOP):

Full working sample:

完整的工作样本:

drop table #t 
create table #t ( ord int primary key, total int, running_total int)

insert #t(ord,total)  values (2,20)
-- notice the malicious re-ordering 
insert #t(ord,total) values (1,10)
insert #t(ord,total)  values (3,10)
insert #t(ord,total)  values (4,1)

declare @total int 
set @total = 0
update #t set running_total = @total, @total = @total + total 

select * from #t
order by ord 

ord         total       running_total
----------- ----------- -------------
1           10          10
2           20          30
3           10          40
4           1           41

You asked for a benchmark this is the lowdown.

你要求一个基准,这是内幕。

The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.

最快的 SAFE 方法是 Cursor,它比交叉连接的相关子查询快一个数量级。

The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.

绝对最快的方法是 UPDATE 技巧。我唯一担心的是,我不确定在所有情况下更新都会以线性方式进行。查询中没有任何内容明确说明。

Bottom line, for production code I would go with the cursor.

最重要的是,对于生产代码,我会使用光标。

Test data:

测试数据:

create table #t ( ord int primary key, total int, running_total int)

set nocount on 
declare @i int
set @i = 0 
begin tran
while @i < 10000
begin
   insert #t (ord, total) values (@i,  rand() * 100) 
    set @i = @i +1
end
commit

Test 1:

测试 1:

SELECT ord,total, 
    (SELECT SUM(total) 
        FROM #t b 
        WHERE b.ord <= a.ord) AS b 
FROM #t a

-- CPU 11731, Reads 154934, Duration 11135 

Test 2:

测试 2:

SELECT a.ord, a.total, SUM(b.total) AS RunningTotal 
FROM #t a CROSS JOIN #t b 
WHERE (b.ord <= a.ord) 
GROUP BY a.ord,a.total 
ORDER BY a.ord

-- CPU 16053, Reads 154935, Duration 4647

Test 3:

测试 3:

DECLARE @TotalTable table(ord int primary key, total int, running_total int)

DECLARE forward_cursor CURSOR FAST_FORWARD 
FOR 
SELECT ord, total
FROM #t 
ORDER BY ord


OPEN forward_cursor 

DECLARE @running_total int, 
    @ord int, 
    @total int
SET @running_total = 0

FETCH NEXT FROM forward_cursor INTO @ord, @total 
WHILE (@@FETCH_STATUS = 0)
BEGIN
     SET @running_total = @running_total + @total
     INSERT @TotalTable VALUES(@ord, @total, @running_total)
     FETCH NEXT FROM forward_cursor INTO @ord, @total 
END

CLOSE forward_cursor
DEALLOCATE forward_cursor

SELECT * FROM @TotalTable

-- CPU 359, Reads 30392, Duration 496

Test 4:

测试 4:

declare @total int 
set @total = 0
update #t set running_total = @total, @total = @total + total 

select * from #t

-- CPU 0, Reads 58, Duration 139

回答by Mikael Eriksson

In SQL Server 2012 you can use SUM()with the OVER()clause.

在 SQL Server 2012 中,您可以将SUM()OVER()子句一起使用。

select id,
       somedate,
       somevalue,
       sum(somevalue) over(order by somedate rows unbounded preceding) as runningtotal
from TestTable

SQL Fiddle

SQL小提琴

回答by Roman Pekar

While Sam Saffron did great work on it, he still didn't provide recursive common table expressioncode for this problem. And for us who working with SQL Server 2008 R2 and not Denali, it's still fastest way to get running total, it's about 10 times faster than cursor on my work computer for 100000 rows, and it's also inline query.
So, here it is (I'm supposing that there's an ordcolumn in the table and it's sequential number without gaps, for fast processing there also should be unique constraint on this number):

虽然 Sam Saffron 在这方面做了大量工作,但他仍然没有为这个问题提供递归公用表表达式代码。对于使用 SQL Server 2008 R2 而不是 Denali 的我们来说,它仍然是获得运行总量的最快方法,它比我的工作计算机上的光标快 10 倍,处理 100000 行,而且它也是内联查询。
所以,这里是(我假设表中有一个ord列并且它是没有间隙的序列号,为了快速处理,这个数字也应该有唯一的约束):

;with 
CTE_RunningTotal
as
(
    select T.ord, T.total, T.total as running_total
    from #t as T
    where T.ord = 0
    union all
    select T.ord, T.total, T.total + C.running_total as running_total
    from CTE_RunningTotal as C
        inner join #t as T on T.ord = C.ord + 1
)
select C.ord, C.total, C.running_total
from CTE_RunningTotal as C
option (maxrecursion 0)

-- CPU 140, Reads 110014, Duration 132

sql fiddle demo

sql fiddle demo

updateI also was curious about this update with variableor quirky update. So usually it works ok, but how we can be sure that it works every time? well, here's a little trick (found it here - http://www.sqlservercentral.com/Forums/Topic802558-203-21.aspx#bm981258) - you just check current and previous ordand use 1/0assignment in case they are different from what you expecting:

更新我也对这个带有变量古怪更新的更新感到好奇。所以通常它可以正常工作,但是我们如何确保它每次都有效?好吧,这里有一个小技巧(在这里找到 - http://www.sqlservercentral.com/Forums/Topic802558-203-21.aspx#bm981258) - 您只需检查当前和以前ord并使用1/0分配,以防它们与什么不同你期待:

declare @total int, @ord int

select @total = 0, @ord = -1

update #t set
    @total = @total + total,
    @ord = case when ord <> @ord + 1 then 1/0 else ord end,
    ------------------------
    running_total = @total

select * from #t

-- CPU 0, Reads 58, Duration 139

From what I've seen if you have proper clustered index/primary key on your table (in our case it would be index by ord_id) update will proceed in a linear way all the time (never encountered divide by zero). That said, it's up to you to decide if you want to use it in production code :)

从我所看到的,如果你的表上有适当的聚集索引/主键(在我们的例子中它是 index by ord_id),更新将一直以线性方式进行(从未遇到过除以零)。也就是说,是否要在生产代码中使用它由您决定:)

update 2I'm linking this answer, cause it includes some useful info about unreliability of the quirky update - nvarchar concatenation / index / nvarchar(max) inexplicable behavior.

更新 2我正在链接这个答案,因为它包含一些关于古怪更新的不可靠性的有用信息 - nvarchar 连接/索引/nvarchar(max) 莫名其妙的行为

回答by Mike Forman

The APPLY operator in SQL 2005 and higher works for this:

SQL 2005 及更高版本中的 APPLY 运算符适用于此:

select
    t.id ,
    t.somedate ,
    t.somevalue ,
    rt.runningTotal
from TestTable t
 cross apply (select sum(somevalue) as runningTotal
                from TestTable
                where somedate <= t.somedate
            ) as rt
order by t.somedate

回答by Sam Axe

SELECT TOP 25   amount, 
    (SELECT SUM(amount) 
    FROM time_detail b 
    WHERE b.time_detail_id <= a.time_detail_id) AS Total FROM time_detail a

You can also use the ROW_NUMBER() function and a temp table to create an arbitrary column to use in the comparison on the inner SELECT statement.

您还可以使用 ROW_NUMBER() 函数和临时表来创建任意列,以便在内部 SELECT 语句的比较中使用。

回答by KthProg

Use a correlated sub-query. Very simple, here you go:

使用相关子查询。很简单,给你:

SELECT 
somedate, 
(SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total
FROM TestTable t1
GROUP BY somedate
ORDER BY somedate

The code might not be exactly correct, but I'm sure that the idea is.

代码可能不完全正确,但我确信这个想法是正确的。

The GROUP BY is in case a date appears more than once, you would only want to see it once in the result set.

GROUP BY 是在日期出现多次的情况下,您只想在结果集中看到一次。

If you don't mind seeing repeating dates, or you want to see the original value and id, then the following is what you want:

如果你不介意看到重复的日期,或者你想看到原始值和 id,那么下面就是你想要的:

SELECT 
id,
somedate, 
somevalue,
(SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total
FROM TestTable t1
ORDER BY somedate

回答by A-K

You can also denormalize - store running totals in the same table:

您还可以非规范化 - 将运行总计存储在同一个表中:

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx

Selects work much faster than any other solutions, but modifications may be slower

选择的工作速度比任何其他解决方案快得多,但修改速度可能较慢

回答by shambhu yadav

If you are using Sql server 2008 R2 above. Then, It would be shortest way to do;

如果您使用的是 Sql server 2008 R2 以上。那么,这将是最短的方法;

Select id
    ,somedate
    ,somevalue,
LAG(runningtotal) OVER (ORDER BY somedate) + somevalue AS runningtotal
From TestTable 

LAGis use to get previous row value. You can do google for more info.

LAG用于获取前一行值。你可以做谷歌了解更多信息。

[1]:

[1]:

回答by araqnid

Assuming that windowing works on SQL Server 2008 like it does elsewhere (that I've tried), give this a go:

假设窗口在 SQL Server 2008 上像在其他地方一样工作(我已经尝试过),请试一试:

select testtable.*, sum(somevalue) over(order by somedate)
from testtable
order by somedate;

MSDNsays it's available in SQL Server 2008 (and maybe 2005 as well?) but I don't have an instance to hand to try it.

MSDN说它在 SQL Server 2008(也许 2005 也是如此?)中可用,但我没有实例可以尝试。

EDIT: well, apparently SQL Server doesn't allow a window specification ("OVER(...)") without specifying "PARTITION BY" (dividing the result up into groups but not aggregating in quite the way GROUP BY does). Annoying-- the MSDN syntax reference suggests that its optional, but I only have SqlServer 2000 instances around at the moment.

编辑:好吧,显然 SQL Server 不允许窗口规范(“OVER(...)”)而不指定“PARTITION BY”(将结果分成几组,但不像 GROUP BY 那样聚合)。烦人——MSDN 语法参考表明它是可选的,但我目前只有 SqlServer 2000 实例。

The query I gave works in both Oracle 10.2.0.3.0 and PostgreSQL 8.4-beta. So tell MS to catch up ;)

我给出的查询适用于 Oracle 10.2.0.3.0 和 PostgreSQL 8.4-beta。所以告诉 MS 赶上 ;)

回答by Dave Barker

The following will produce the required results.

以下将产生所需的结果。

SELECT a.SomeDate,
       a.SomeValue,
       SUM(b.SomeValue) AS RunningTotal
FROM TestTable a
CROSS JOIN TestTable b
WHERE (b.SomeDate <= a.SomeDate) 
GROUP BY a.SomeDate,a.SomeValue
ORDER BY a.SomeDate,a.SomeValue

Having a clustered index on SomeDate will greatly improve the performance.

SomeDate 上的聚集索引将大大提高性能。