SQL 在 PostgreSQL 中计算累积和

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22841206/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 01:34:07  来源:igfitidea点击:

Calculating Cumulative Sum in PostgreSQL

sqlpostgresqlwindow-functionsanalytic-functionscumulative-sum

提问by Yousuf Sultan

I want to find the cumulative or running amount of field and insert it from staging to table. My staging structure is something like this:

我想找到字段的累积或运行量并将其从登台插入到表中。我的分期结构是这样的:

ea_month    id       amount    ea_year    circle_id
April       92570    1000      2014        1
April       92571    3000      2014        2
April       92572    2000      2014        3
March       92573    3000      2014        1
March       92574    2500      2014        2
March       92575    3750      2014        3
February    92576    2000      2014        1
February    92577    2500      2014        2
February    92578    1450      2014        3          

I want my target table to look something like this:

我希望我的目标表看起来像这样:

ea_month    id       amount    ea_year    circle_id    cum_amt
February    92576    1000      2014        1           1000 
March       92573    3000      2014        1           4000
April       92570    2000      2014        1           6000
February    92577    3000      2014        2           3000
March       92574    2500      2014        2           5500
April       92571    3750      2014        2           9250
February    92578    2000      2014        3           2000
March       92575    2500      2014        3           4500
April       92572    1450      2014        3           5950

I am really very much confused with how to go about achieving this result. I want to achieve this result using PostgreSQL.

我真的很困惑如何去实现这个结果。我想使用 PostgreSQL 实现这个结果。

Can anyone suggest how to go about achieving this result-set?

谁能建议如何去实现这个结果集?

回答by Erwin Brandstetter

Basically, you need a window functionhere. That's a standard feature nowadays. In addition to genuine window functions, you can use anyaggregate function as window function in Postgres by appending an OVERclause.

基本上,您在这里需要一个窗口函数。这是当今的标准功能。除了真正的窗口函数之外,您还可以通过附加子句将任何聚合函数用作 Postgres 中的窗口函数OVER

The special difficulty here is to get partitions and sort order right:

这里的特殊困难是正确地获得分区和排序顺序:

SELECT ea_month, id, amount, ea_year, circle_id
     , sum(amount) OVER (PARTITION BY circle_id ORDER BY ea_year, ea_month) AS cum_amt
FROM   tbl
ORDER  BY circle_id, month;

And noGROUP BYhere.

而这里没有GROUP BY

The sum for each row is calculated from the first row in the partition to the current row - or quoting the manualto be precise:

每行的总和是从分区中的第一行到当前行计算的 - 或者引用手册来精确:

The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BYpeer.

默认的成帧选项是RANGE UNBOUNDED PRECEDING,与RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. 使用 ORDER BY,这会将框架设置为从分区开始到当前行的最后一个ORDER BYpeer 的所有行

... which is the cumulative or running sum you are after. Bold emphasis mine.

...这是您所追求的累积或运行总和。大胆强调我的。

Rows with the same (circle_id, ea_year, ea_month)are "peers"in this query. All of those show the same running sum with all peers added to the sum. But I assume your table is UNIQUEon (circle_id, ea_year, ea_month), then the sort order is deterministic and no row has peers.

具有相同的行(circle_id, ea_year, ea_month)是此查询中的“同行”。所有这些都显示相同的运行总和,所有对等点都添加到总和中。但是我假设您的表UNIQUE(circle_id, ea_year, ea_month),那么排序顺序是确定性的,并且没有行有对等项。

Now, ORDER BY ... ea_monthwon't work with strings for month names. Postgres would sort alphabetically according to the locale setting.

现在,ORDER BY ... ea_month不适用于月份名称的字符串。Postgres 将根据区域设置按字母顺序排序。

If you have actual datevalues stored in your table you can sort properly. If not, I suggest to replace ea_yearand ea_monthwith a single column monof type datein your table.

如果您date的表中存储了实际值,则可以正确排序。如果没有,我建议用表格中的单列类型替换ea_year和。ea_monthmondate

  • Transform what you have with to_date():

    to_date(ea_year || ea_month , 'YYYYMonth') AS mon
    
  • For display you can get original strings with to_char():

    to_char(mon, 'Month') AS ea_month
    to_char(mon, 'YYYY') AS ea_year
    
  • 改变你所拥有的to_date()

    to_date(ea_year || ea_month , 'YYYYMonth') AS mon
    
  • 对于显示,您可以获得原始字符串to_char()

    to_char(mon, 'Month') AS ea_month
    to_char(mon, 'YYYY') AS ea_year
    

While stuck with the unfortunate layout, this will work:

虽然坚持不幸的布局,这将起作用:

SELECT ea_month, id, amount, ea_year, circle_id
     , sum(amount) OVER (PARTITION BY circle_id ORDER BY mon) AS cum_amt
FROM   (SELECT *, to_date(ea_year || ea_month, 'YYYYMonth') AS mon FROM tbl)
ORDER  BY circle_id, mon;