postgresql Postgres - 如何为缺失数据返回计数为 0 的行？

Question

提问by JV.

I have unevenly distributed data(wrt date) for a few years (2003-2008). I want to query data for a given set of start and end date, grouping the data by any of the supported intervals (day, week, month, quarter, year) in PostgreSQL 8.3 (http://www.postgresql.org/docs/8.3/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC).

我有几年（2003-2008）不均匀分布的数据（wrt 日期）。我想查询一组给定的开始和结束日期的数据，按 PostgreSQL 8.3 ( http://www.postgresql.org/docs /8.3/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC）。

The problem is that some of the queries give results continuous over the required period, as this one:

问题是某些查询会在所需的时间段内提供连续的结果，如下所示：

select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id) 
from some_table where category_id=1 and entity_id = 77  and entity2_id = 115 
and date <= '2008-12-06' and date >= '2007-12-01' group by 
date_trunc('month',date) order by date_trunc('month',date);
          to_char   | count 
        ------------+-------
         2007-12-01 |    64
         2008-01-01 |    31
         2008-02-01 |    14
         2008-03-01 |    21
         2008-04-01 |    28
         2008-05-01 |    44
         2008-06-01 |   100
         2008-07-01 |    72
         2008-08-01 |    91
         2008-09-01 |    92
         2008-10-01 |    79
         2008-11-01 |    65
        (12 rows)

but some of them miss some intervals because there is no data present, as this one:

但其中一些错过了一些时间间隔，因为没有数据存在，如下所示：

select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id) 
from some_table where category_id=1 and entity_id = 75  and entity2_id = 115 
and date <= '2008-12-06' and date >= '2007-12-01' group by 
date_trunc('month',date) order by date_trunc('month',date);

        to_char   | count 
    ------------+-------

     2007-12-01 |     2
     2008-01-01 |     2
     2008-03-01 |     1
     2008-04-01 |     2
     2008-06-01 |     1
     2008-08-01 |     3
     2008-10-01 |     2
    (7 rows)

where the required resultset is:

其中所需的结果集是：

  to_char   | count 
------------+-------
 2007-12-01 |     2
 2008-01-01 |     2
 2008-02-01 |     0
 2008-03-01 |     1
 2008-04-01 |     2
 2008-05-01 |     0
 2008-06-01 |     1
 2008-07-01 |     0
 2008-08-01 |     3
 2008-09-01 |     0
 2008-10-01 |     2
 2008-11-01 |     0
(12 rows)

A count of 0 for missing entries.

缺少条目的计数为 0。

I have seen earlier discussions on Stack Overflow but they don't solve my problem it seems, since my grouping period is one of (day, week, month, quarter, year) and decided on runtime by the application. So an approach like left join with a calendar table or sequence table will not help I guess.

我已经看到早期关于 Stack Overflow 的讨论，但它们似乎没有解决我的问题，因为我的分组周期是（天、周、月、季度、年）之一，并由应用程序决定运行时。因此，我猜想像向左加入日历表或序列表这样的方法无济于事。

My current solution to this is to fill in these gaps in Python (in a Turbogears App) using the calendar module.

我目前的解决方案是使用日历模块填补 Python（在 Turbogears 应用程序中）中的这些空白。

Is there a better way to do this.

有一个更好的方法吗。

Answer 1

回答by Erwin Brandstetter

^{This question is old. But since fellow users picked it as master for a new duplicate I am adding a proper answer.}

^{这个问题很老了。但是由于其他用户选择它作为新副本的主人，我添加了一个正确的答案。}

Proper solution

正确的解决方案

SELECT *
FROM  (
   SELECT day::date
   FROM   generate_series(timestamp '2007-12-01'
                        , timestamp '2008-12-01'
                        , interval  '1 month') day
   ) d
LEFT   JOIN (
   SELECT date_trunc('month', date_col)::date AS day
        , count(*) AS some_count
   FROM   tbl
   WHERE  date_col >= date '2007-12-01'
   AND    date_col <= date '2008-12-06'
-- AND    ... more conditions
   GROUP  BY 1
   ) t USING (day)
ORDER  BY day;

Use LEFT JOIN, of course.
generate_series()can produce a table of timestamps on the fly, and very fast.
It's generally faster to aggregate beforeyou join. I recently provided a test case on sqlfiddle.com in this related answer:
- PostgreSQL - order by an array
Cast the timestampto date(::date) for a basic format. For more use to_char().
GROUP BY 1is syntax shorthand to reference the first output column. Could be GROUP BY dayas well, but that might conflict with an existing column of the same name. Or GROUP BY date_trunc('month', date_col)::datebut that's too long for my taste.
Works with the available interval arguments for date_trunc().
count()never produces NULL(0for no rows), but the LEFT JOINdoes.
To return 0instead of NULLin the outer SELECT, use COALESCE(some_count, 0) AS some_count. The manual.
For a more generic solution or arbitrary time intervalsconsider this closely related answer:
- Best way to count records by arbitrary time intervals in Rails+Postgres

使用LEFT JOIN，当然。
generate_series()可以即时生成时间戳表，而且速度非常快。
在加入之前聚合通常会更快。我最近在 sqlfiddle.com 上的这个相关答案中提供了一个测试用例：
- PostgreSQL - 按数组排序
将转换timestamp为date( ::date) 以获得基本格式。更多用途to_char()。
GROUP BY 1是引用第一个输出列的语法简写。也可以GROUP BY day，但这可能与同名的现有列冲突。或者GROUP BY date_trunc('month', date_col)::date，这对我的口味来说太长了。
与的可用区间参数一起使用date_trunc()。
count()从不产生NULL（0没有行），但LEFT JOIN确实如此。
要返回0而不是NULL在外部SELECT，请使用COALESCE(some_count, 0) AS some_count. 手册。
对于更通用的解决方案或任意时间间隔，请考虑这个密切相关的答案：
- 在 Rails+Postgres 中按任意时间间隔计算记录的最佳方法

Answer 2

回答by Martin v. L?wis

You can create the list of all first days of the last year (say) with

您可以创建去年（比如说）所有第一天的列表

select distinct date_trunc('month', (current_date - offs)) as date 
from generate_series(0,365,28) as offs;
          date
------------------------
 2007-12-01 00:00:00+01
 2008-01-01 00:00:00+01
 2008-02-01 00:00:00+01
 2008-03-01 00:00:00+01
 2008-04-01 00:00:00+02
 2008-05-01 00:00:00+02
 2008-06-01 00:00:00+02
 2008-07-01 00:00:00+02
 2008-08-01 00:00:00+02
 2008-09-01 00:00:00+02
 2008-10-01 00:00:00+02
 2008-11-01 00:00:00+01
 2008-12-01 00:00:00+01

Then you can join with that series.

然后您可以加入该系列。

Answer 3

回答by ng.mangine

You could create a temporary table at runtime and left join on that. That seems to make the most sense.

您可以在运行时创建一个临时表并在其上留下连接。这似乎是最有意义的。

postgresql Postgres - 如何为缺失数据返回计数为 0 的行？

提问by JV.

回答by Erwin Brandstetter

Proper solution

正确的解决方案

回答by Martin v. L?wis

回答by ng.mangine

相关推荐

最近更新

标签

postgresql Postgres - 如何为缺失数据返回计数为 0 的行？

提问by JV.

回答by Erwin Brandstetter

Proper solution

正确的解决方案

回答by Martin v. L?wis

回答by ng.mangine

相关推荐

eclipse 如何在 macOS Sierra 上卸载 Java 9

在 Eclipse 中从 Spring:boot 项目创建 war 文件

如何在 Mac 操作系统上运行 Eclipse 内存分析器？

Eclipse 在启动过程中出现内部错误

相关推荐

最近更新

标签