postgresql 查询滚动日期范围内不同值的计数

Question

提问by harold

I have a data set of email addresses and dates that those email addresses were added to a table. There can be multiple entries of an email address for various different dates. For example, if I have the data set below. I would be looking to get the date and count of distinct emails that we have between said date and 3 days ago.

我有一个电子邮件地址和日期的数据集，这些电子邮件地址被添加到表格中。一个电子邮件地址可以有多个不同日期的条目。例如，如果我有下面的数据集。我希望获取我们在所述日期和 3 天前之间收到的不同电子邮件的日期和数量。

Date   | email  
-------+----------------
1/1/12 | [email protected]
1/1/12 | [email protected]
1/1/12 | [email protected]
1/2/12 | [email protected]
1/2/12 | [email protected]
1/3/12 | [email protected]
1/4/12 | [email protected]
1/5/12 | [email protected]
1/5/12 | [email protected]
1/6/12 | [email protected]
1/6/12 | [email protected]
1/6/12 | [email protected]

Result set would look something like this if we use a date period of 3

如果我们使用日期时间段 3，结果集将如下所示

date   | count(distinct email)
-------+------
1/1/12 | 3
1/2/12 | 3
1/3/12 | 3
1/4/12 | 3
1/5/12 | 2
1/6/12 | 2

I can get a distinct count of a date range using the query below, but looking to get a count of a range by day so I do not have to manually update the range for hundreds of dates.

我可以使用下面的查询获得日期范围的不同计数，但希望按天获取范围计数，因此我不必手动更新数百个日期的范围。

select test.date, count(distinct test.email)  
from test_table as test  
where test.date between '2012-01-01' and '2012-05-08'  
group by test.date;

Help is appreciated.

帮助表示赞赏。

Answer 1

回答by Erwin Brandstetter

Test case:

测试用例：

CREATE TEMP TABLE tbl (day date, email text);
INSERT INTO tbl VALUES
 ('2012-01-01', '[email protected]')
,('2012-01-01', '[email protected]')
,('2012-01-01', '[email protected]')
,('2012-01-02', '[email protected]')
,('2012-01-02', '[email protected]')
,('2012-01-03', '[email protected]')
,('2012-01-04', '[email protected]')
,('2012-01-05', '[email protected]')
,('2012-01-05', '[email protected]')
,('2012-01-06', '[email protected]')
,('2012-01-06', '[email protected]')
,('2012-01-06', '[email protected]`');

Query - returns only days where an entry exists in tbl:

查询 - 仅返回条目存在的天数tbl：

SELECT day
     ,(SELECT count(DISTINCT email)
       FROM   tbl
       WHERE  day BETWEEN t.day - 2 AND t.day -- period of 3 days
      ) AS dist_emails
FROM   tbl t
WHERE  day BETWEEN '2012-01-01' AND '2012-01-06'  
GROUP  BY 1
ORDER  BY 1;

Or - return all daysin the specified range, even if there are no rows for the day:

或者 - 返回指定范围内的所有天数，即使当天没有行：

SELECT day
     ,(SELECT count(DISTINCT email)
       FROM   tbl
       WHERE  day BETWEEN g.day - 2 AND g.day
      ) AS dist_emails
FROM  (SELECT generate_series('2012-01-01'::date
                            , '2012-01-06'::date, '1d')::date) AS g(day)

Result:

结果：

day        | dist_emails
-----------+------------
2012-01-01 | 3
2012-01-02 | 3
2012-01-03 | 3
2012-01-04 | 3
2012-01-05 | 1
2012-01-06 | 2

This sounded like a job for window functionsat first, but I did not find a way to define the suitable window frame. Also, per documentation:

起初这听起来像是一个窗口函数的工作，但我没有找到定义合适的窗口框架的方法。此外，根据文档：

Aggregate window functions, unlike normal aggregate functions, do not allow DISTINCTor ORDER BYto be used within the function argument list.

与普通的聚合函数不同，聚合窗口函数不允许DISTINCT或ORDER BY不能在函数参数列表中使用。

So I solved it with correlated subqueries instead. I guess that's the smartest way.

所以我用相关的子查询来解决它。我想这是最聪明的方法。

I renamed your date column to day, because it is bad practice to use type names as identifiers.

我将您的日期列重命名为day，因为使用类型名称作为标识符是不好的做法。

BTW, "between said date and 3 days ago" would be a period of 4days. Your definition is contradictory there.

顺便说一句，“在所述日期和 3 天前之间”将是4天。你的定义在那里是矛盾的。

A bit shorter, but slower for only a few days:

短一点，但慢了几天：

SELECT day, count(DISTINCT email) AS dist_emails
FROM  (SELECT generate_series('2013-01-01'::date
                            , '2013-01-06'::date, '1d')::date) AS g(day)
LEFT   JOIN tbl t ON t.day BETWEEN g.day - 2 AND g.day
GROUP  BY 1
ORDER  BY 1;

Answer 2

回答by JMEls

Instead of specifying the dates, you could always use a dateadd function:

您可以始终使用 dateadd 函数，而不是指定日期：

test.date > dateadd(dd,-7,getdate())

Answer 3

回答by user3827333

An example for sliding window distinct count:

滑动窗口不同计数的示例：

SELECT b.day, count(DISTINCT a.user_id)
from glip_production.presences_1d a,
 (SELECT distinct(day), TIMESTAMPADD(day,-6, day) dt_start
  from glip_production.presences_1d t1) b
where a.day >= b.dt_start and a.day <= b.day and b.day > '2017-11-01'
group by b.day

Answer 4

回答by maSTAShuFu

in sql server :

在 sql 服务器中：

`select test.date, count(distinct test.email) from test_table as test  where convert(date,test.date) between '2012-01-01' and '2012-05-08' group by test.date`

hope this helps.

希望这可以帮助。

postgresql 查询滚动日期范围内不同值的计数

提问by harold

回答by Erwin Brandstetter

回答by JMEls

回答by user3827333

回答by maSTAShuFu

相关推荐

最近更新

标签

postgresql 查询滚动日期范围内不同值的计数

提问by harold

回答by Erwin Brandstetter

回答by JMEls

回答by user3827333

回答by maSTAShuFu

相关推荐

PostgreSQL 在 plpgsql 函数中创建临时表

postgresql 理解 postgres 解释 w/位图堆/索引扫描

在 PostgreSQL 中匹配模式时如何转义字符串

用于验证电子邮件地址的 PostgreSQL 正则表达式

相关推荐

最近更新

标签