postgresql Postgres - 如何为缺失数据返回计数为 0 的行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/346132/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Postgres - how to return rows with 0 count for missing data?
提问by JV.
I have unevenly distributed data(wrt date) for a few years (2003-2008). I want to query data for a given set of start and end date, grouping the data by any of the supported intervals (day, week, month, quarter, year) in PostgreSQL 8.3 (http://www.postgresql.org/docs/8.3/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC).
我有几年(2003-2008)不均匀分布的数据(wrt 日期)。我想查询一组给定的开始和结束日期的数据,按 PostgreSQL 8.3 ( http://www.postgresql.org/docs /8.3/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC)。
The problem is that some of the queries give results continuous over the required period, as this one:
问题是某些查询会在所需的时间段内提供连续的结果,如下所示:
select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id)
from some_table where category_id=1 and entity_id = 77 and entity2_id = 115
and date <= '2008-12-06' and date >= '2007-12-01' group by
date_trunc('month',date) order by date_trunc('month',date);
to_char | count
------------+-------
2007-12-01 | 64
2008-01-01 | 31
2008-02-01 | 14
2008-03-01 | 21
2008-04-01 | 28
2008-05-01 | 44
2008-06-01 | 100
2008-07-01 | 72
2008-08-01 | 91
2008-09-01 | 92
2008-10-01 | 79
2008-11-01 | 65
(12 rows)
but some of them miss some intervals because there is no data present, as this one:
但其中一些错过了一些时间间隔,因为没有数据存在,如下所示:
select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id)
from some_table where category_id=1 and entity_id = 75 and entity2_id = 115
and date <= '2008-12-06' and date >= '2007-12-01' group by
date_trunc('month',date) order by date_trunc('month',date);
to_char | count
------------+-------
2007-12-01 | 2
2008-01-01 | 2
2008-03-01 | 1
2008-04-01 | 2
2008-06-01 | 1
2008-08-01 | 3
2008-10-01 | 2
(7 rows)
where the required resultset is:
其中所需的结果集是:
to_char | count
------------+-------
2007-12-01 | 2
2008-01-01 | 2
2008-02-01 | 0
2008-03-01 | 1
2008-04-01 | 2
2008-05-01 | 0
2008-06-01 | 1
2008-07-01 | 0
2008-08-01 | 3
2008-09-01 | 0
2008-10-01 | 2
2008-11-01 | 0
(12 rows)
A count of 0 for missing entries.
缺少条目的计数为 0。
I have seen earlier discussions on Stack Overflow but they don't solve my problem it seems, since my grouping period is one of (day, week, month, quarter, year) and decided on runtime by the application. So an approach like left join with a calendar table or sequence table will not help I guess.
我已经看到早期关于 Stack Overflow 的讨论,但它们似乎没有解决我的问题,因为我的分组周期是(天、周、月、季度、年)之一,并由应用程序决定运行时。因此,我猜想像向左加入日历表或序列表这样的方法无济于事。
My current solution to this is to fill in these gaps in Python (in a Turbogears App) using the calendar module.
我目前的解决方案是使用日历模块填补 Python(在 Turbogears 应用程序中)中的这些空白。
Is there a better way to do this.
有一个更好的方法吗。
回答by Erwin Brandstetter
This question is old. But since fellow users picked it as master for a new duplicate I am adding a proper answer.
这个问题很老了。但是由于其他用户选择它作为新副本的主人,我添加了一个正确的答案。
Proper solution
正确的解决方案
SELECT *
FROM (
SELECT day::date
FROM generate_series(timestamp '2007-12-01'
, timestamp '2008-12-01'
, interval '1 month') day
) d
LEFT JOIN (
SELECT date_trunc('month', date_col)::date AS day
, count(*) AS some_count
FROM tbl
WHERE date_col >= date '2007-12-01'
AND date_col <= date '2008-12-06'
-- AND ... more conditions
GROUP BY 1
) t USING (day)
ORDER BY day;
Use
LEFT JOIN
, of course.generate_series()
can produce a table of timestamps on the fly, and very fast.It's generally faster to aggregate beforeyou join. I recently provided a test case on sqlfiddle.com in this related answer:
Cast the
timestamp
todate
(::date
) for a basic format. For more useto_char()
.GROUP BY 1
is syntax shorthand to reference the first output column. Could beGROUP BY day
as well, but that might conflict with an existing column of the same name. OrGROUP BY date_trunc('month', date_col)::date
but that's too long for my taste.Works with the available interval arguments for
date_trunc()
.count()
never producesNULL
(0
for no rows), but theLEFT JOIN
does.
To return0
instead ofNULL
in the outerSELECT
, useCOALESCE(some_count, 0) AS some_count
. The manual.For a more generic solution or arbitrary time intervalsconsider this closely related answer:
使用
LEFT JOIN
,当然。generate_series()
可以即时生成时间戳表,而且速度非常快。在加入之前聚合通常会更快。我最近在 sqlfiddle.com 上的这个相关答案中提供了一个测试用例:
将 转换
timestamp
为date
(::date
) 以获得基本格式。更多用途to_char()
。GROUP BY 1
是引用第一个输出列的语法简写。也可以GROUP BY day
,但这可能与同名的现有列冲突。或者GROUP BY date_trunc('month', date_col)::date
,这对我的口味来说太长了。与 的可用区间参数一起使用
date_trunc()
。count()
从不产生NULL
(0
没有行),但LEFT JOIN
确实如此。
要返回0
而不是NULL
在外部SELECT
,请使用COALESCE(some_count, 0) AS some_count
. 手册。对于更通用的解决方案或任意时间间隔,请考虑这个密切相关的答案:
回答by Martin v. L?wis
You can create the list of all first days of the last year (say) with
您可以创建去年(比如说)所有第一天的列表
select distinct date_trunc('month', (current_date - offs)) as date
from generate_series(0,365,28) as offs;
date
------------------------
2007-12-01 00:00:00+01
2008-01-01 00:00:00+01
2008-02-01 00:00:00+01
2008-03-01 00:00:00+01
2008-04-01 00:00:00+02
2008-05-01 00:00:00+02
2008-06-01 00:00:00+02
2008-07-01 00:00:00+02
2008-08-01 00:00:00+02
2008-09-01 00:00:00+02
2008-10-01 00:00:00+02
2008-11-01 00:00:00+01
2008-12-01 00:00:00+01
Then you can join with that series.
然后您可以加入该系列。
回答by ng.mangine
You could create a temporary table at runtime and left join on that. That seems to make the most sense.
您可以在运行时创建一个临时表并在其上留下连接。这似乎是最有意义的。