postgresql 查询滚动日期范围内不同值的计数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10544182/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-20 23:54:40  来源:igfitidea点击:

Query for count of distinct values in a rolling date range

sqlpostgresqldatecount

提问by harold

I have a data set of email addresses and dates that those email addresses were added to a table. There can be multiple entries of an email address for various different dates. For example, if I have the data set below. I would be looking to get the date and count of distinct emails that we have between said date and 3 days ago.

我有一个电子邮件地址和日期的数据集,这些电子邮件地址被添加到表格中。一个电子邮件地址可以有多个不同日期的条目。例如,如果我有下面的数据集。我希望获取我们在所述日期和 3 天前之间收到的不同电子邮件的日期和数量。

Date   | email  
-------+----------------
1/1/12 | [email protected]
1/1/12 | [email protected]
1/1/12 | [email protected]
1/2/12 | [email protected]
1/2/12 | [email protected]
1/3/12 | [email protected]
1/4/12 | [email protected]
1/5/12 | [email protected]
1/5/12 | [email protected]
1/6/12 | [email protected]
1/6/12 | [email protected]
1/6/12 | [email protected]

Result set would look something like this if we use a date period of 3

如果我们使用日期时间段 3,结果集将如下所示

date   | count(distinct email)
-------+------
1/1/12 | 3
1/2/12 | 3
1/3/12 | 3
1/4/12 | 3
1/5/12 | 2
1/6/12 | 2

I can get a distinct count of a date range using the query below, but looking to get a count of a range by day so I do not have to manually update the range for hundreds of dates.

我可以使用下面的查询获得日期范围的不同计数,但希望按天获取范围计数,因此我不必手动更新数百个日期的范围。

select test.date, count(distinct test.email)  
from test_table as test  
where test.date between '2012-01-01' and '2012-05-08'  
group by test.date;

Help is appreciated.

帮助表示赞赏。

回答by Erwin Brandstetter

Test case:

测试用例:

CREATE TEMP TABLE tbl (day date, email text);
INSERT INTO tbl VALUES
 ('2012-01-01', '[email protected]')
,('2012-01-01', '[email protected]')
,('2012-01-01', '[email protected]')
,('2012-01-02', '[email protected]')
,('2012-01-02', '[email protected]')
,('2012-01-03', '[email protected]')
,('2012-01-04', '[email protected]')
,('2012-01-05', '[email protected]')
,('2012-01-05', '[email protected]')
,('2012-01-06', '[email protected]')
,('2012-01-06', '[email protected]')
,('2012-01-06', '[email protected]`');

Query - returns only days where an entry exists in tbl:

查询 - 仅返回条目存在的天数tbl

SELECT day
     ,(SELECT count(DISTINCT email)
       FROM   tbl
       WHERE  day BETWEEN t.day - 2 AND t.day -- period of 3 days
      ) AS dist_emails
FROM   tbl t
WHERE  day BETWEEN '2012-01-01' AND '2012-01-06'  
GROUP  BY 1
ORDER  BY 1;

Or - return all daysin the specified range, even if there are no rows for the day:

或者 - 返回指定范围内的所有天数,即使当天没有行:

SELECT day
     ,(SELECT count(DISTINCT email)
       FROM   tbl
       WHERE  day BETWEEN g.day - 2 AND g.day
      ) AS dist_emails
FROM  (SELECT generate_series('2012-01-01'::date
                            , '2012-01-06'::date, '1d')::date) AS g(day)

Result:

结果:

day        | dist_emails
-----------+------------
2012-01-01 | 3
2012-01-02 | 3
2012-01-03 | 3
2012-01-04 | 3
2012-01-05 | 1
2012-01-06 | 2

This sounded like a job for window functionsat first, but I did not find a way to define the suitable window frame. Also, per documentation:

起初这听起来像是一个窗口函数的工作,但我没有找到定义合适的窗口框架的方法。此外,根据文档

Aggregate window functions, unlike normal aggregate functions, do not allow DISTINCTor ORDER BYto be used within the function argument list.

与普通的聚合函数不同,聚合窗口函数不允许DISTINCTORDER BY不能在函数参数列表中使用。

So I solved it with correlated subqueries instead. I guess that's the smartest way.

所以我用相关的子查询来解决它。我想这是最聪明的方法。

I renamed your date column to day, because it is bad practice to use type names as identifiers.

我将您的日期列重命名为day,因为使用类型名称作为标识符是不好的做法。

BTW, "between said date and 3 days ago" would be a period of 4days. Your definition is contradictory there.

顺便说一句,“在所述日期和 3 天前之间”将是4天。你的定义在那里是矛盾的。

A bit shorter, but slower for only a few days:

短一点,但慢了几天:

SELECT day, count(DISTINCT email) AS dist_emails
FROM  (SELECT generate_series('2013-01-01'::date
                            , '2013-01-06'::date, '1d')::date) AS g(day)
LEFT   JOIN tbl t ON t.day BETWEEN g.day - 2 AND g.day
GROUP  BY 1
ORDER  BY 1;

回答by JMEls

Instead of specifying the dates, you could always use a dateadd function:

您可以始终使用 dateadd 函数,而不是指定日期:

test.date > dateadd(dd,-7,getdate())

回答by user3827333

An example for sliding window distinct count:

滑动窗口不同计数的示例:

SELECT b.day, count(DISTINCT a.user_id)
from glip_production.presences_1d a,
 (SELECT distinct(day), TIMESTAMPADD(day,-6, day) dt_start
  from glip_production.presences_1d t1) b
where a.day >= b.dt_start and a.day <= b.day and b.day > '2017-11-01'
group by b.day

回答by maSTAShuFu

in sql server :

在 sql 服务器中:

`select test.date, count(distinct test.email) from test_table as test  where convert(date,test.date) between '2012-01-01' and '2012-05-08' group by test.date`

hope this helps.

希望这可以帮助。