SQL 计算 Postgresql 中的累计总数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5698452/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Count cumulative total in Postgresql
提问by khairul
I am using count
and group by
to get the number of subscribers registered each day:
我正在使用count
并group by
获取每天注册的订阅者数量:
SELECT created_at, COUNT(email)
FROM subscriptions
GROUP BY created at;
Result:
结果:
created_at count
-----------------
04-04-2011 100
05-04-2011 50
06-04-2011 50
07-04-2011 300
I want to get the cumulative total of subscribers every day instead. How do I get this?
我想每天获得订阅者的累计总数。我怎么得到这个?
created_at count
-----------------
04-04-2011 100
05-04-2011 150
06-04-2011 200
07-04-2011 500
回答by intgr
With larger datasets, window functionsare the most efficient way to perform these kinds of queries -- the table will be scanned only once, instead of once for each date, like a self-join would do. It also looks a lot simpler. :) PostgreSQL 8.4 and up have support for window functions.
对于较大的数据集,窗口函数是执行此类查询的最有效方式——表将只扫描一次,而不是像自联接那样对每个日期扫描一次。它看起来也简单了很多。:) PostgreSQL 8.4 及更高版本支持窗口函数。
This is what it looks like:
这是它的样子:
SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM subscriptions
GROUP BY created_at;
Here OVER
creates the window; ORDER BY created_at
means that it has to sum up the counts in created_at
order.
这里OVER
创建了窗口;ORDER BY created_at
意味着它必须按created_at
顺序总结计数。
Edit:If you want to remove duplicate emails within a single day, you can use sum(count(distinct email))
. Unfortunately this won't remove duplicates that cross different dates.
编辑:如果您想在一天内删除重复的电子邮件,您可以使用sum(count(distinct email))
. 不幸的是,这不会删除跨越不同日期的重复项。
If you want to remove allduplicates, I think the easiest is to use a subquery and DISTINCT ON
. This will attribute emails to their earliest date (because I'm sorting by created_at in ascending order, it'll choose the earliest one):
如果要删除所有重复项,我认为最简单的方法是使用子查询和DISTINCT ON
. 这会将电子邮件归因于它们的最早日期(因为我按 created_at 升序排序,它将选择最早的日期):
SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM (
SELECT DISTINCT ON (email) created_at, email
FROM subscriptions ORDER BY email, created_at
) AS subq
GROUP BY created_at;
If you create an index on (email, created_at)
, this query shouldn't be too slow either.
如果您在 上创建索引(email, created_at)
,则此查询也不应该太慢。
(If you want to test, this is how I created the sample dataset)
(如果你想测试,这就是我创建示例数据集的方式)
create table subscriptions as
select date '2000-04-04' + (i/10000)::int as created_at,
'[email protected]' || (i%700000)::text as email
from generate_series(1,1000000) i;
create index on subscriptions (email, created_at);
回答by OMG Ponies
Use:
用:
SELECT a.created_at,
(SELECT COUNT(b.email)
FROM SUBSCRIPTIONS b
WHERE b.created_at <= a.created_at) AS count
FROM SUBSCRIPTIONS a
回答by Andriy M
SELECT
s1.created_at,
COUNT(s2.email) AS cumul_count
FROM subscriptions s1
INNER JOIN subscriptions s2 ON s1.created_at >= s2.created_at
GROUP BY s1.created_at
回答by Endy Tjahjono
I assume you want only one row per day and you want to still show days without any subscriptions (suppose nobody subscribes for a certain date, do you want to show that date with the balance of the previous day?). If this is the case, you can use the 'with' feature:
我假设您每天只需要一行,并且您仍然希望显示没有任何订阅的天数(假设没有人订阅某个日期,您想显示该日期与前一天的余额吗?)。如果是这种情况,您可以使用“with”功能:
with recursive serialdates(adate) as (
select cast('2011-04-04' as date)
union all
select adate + 1 from serialdates where adate < cast('2011-04-07' as date)
)
select D.adate,
(
select count(distinct email)
from subscriptions
where created_at between date_trunc('month', D.adate) and D.adate
)
from serialdates D
回答by mentat
The best way is to have a calendar table: calendar ( date date, month int, quarter int, half int, week int, year int )
最好的方法是有一个日历表:日历(日期日期,月份整数,季度整数,半整数,周整数,年整数)
Then, you can join this table to make summary for the field you need.
然后,您可以加入此表以对您需要的字段进行汇总。