SQL PostgreSQL:按分钟运行查询的行数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8193688/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 13:20:13  来源:igfitidea点击:

PostgreSQL: running count of rows for a query 'by minute'

sqlpostgresqldatetimeaggregate-functionswindow-functions

提问by GabiMe

I need to query for each minute the total count of rows up to that minute.

我需要每分钟查询到该分钟的总行数。

The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute:

到目前为止我能达到的最好成绩并不能解决问题。它返回每分钟计数,而不是每分钟的总计数:

SELECT COUNT(id) AS count
     , EXTRACT(hour from "when") AS hour
     , EXTRACT(minute from "when") AS minute
  FROM mytable
 GROUP BY hour, minute

回答by Erwin Brandstetter

Only return minutes with activity

仅返回活动时间

Shortest

最短的

SELECT DISTINCT
       date_trunc('minute', "when") AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY 1;
  • Use date_trunc(), it returns exactly what you need.

  • Don't include idin the query, since you want to GROUP BYminute slices.

  • count()is typically used as plain aggregate function. Appending an OVERclause makes it a window function. Omit PARTITION BYin the window definition - you want a running count over allrows. By default, that counts from the first row to the last peer of the current row as defined by ORDER BY. I quote the manual:

    The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BYpeer.

    And that happens to be exactlywhat you need.

  • Use count(*)rather than count(id). It better fits your question ("count of rows"). It is generally slightly fasterthan count(id). And, while we might assume that idis NOT NULL, it has not been specified in the question, so count(id)is wrong, strictly speaking, because NULL values are not counted with count(id).

  • You can't GROUP BYminute slices at the same query level. Aggregate functions are applied beforewindow functions, the window function count(*)would only see 1 row per minute this way.
    You can, however, SELECT DISTINCT, because DISTINCTis applied afterwindow functions.

  • ORDER BY 1is just shorthand for ORDER BY date_trunc('minute', "when")here.
    1is a positional reference reference to the 1st expression in the SELECTlist.

  • Use to_char()if you need to format the result. Like:

  • 使用date_trunc(),它返回的正是你所需要的。

  • 不要包含id在查询中,因为您想要GROUP BY细分。

  • count()通常用作普通聚合函数。附加OVER子句使其成为窗口函数PARTITION BY在窗口定义中省略- 您需要对所有行进行运行计数。默认情况下,从第一行到由 定义的当前行的最后一个同行计数ORDER BY我引用手册

    默认的成帧选项是RANGE UNBOUNDED PRECEDING,与RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. 使用ORDER BY,这会将框架设置为从分区启动到当前行的最后一个ORDER BY对等方的所有行。

    而这恰好正是您所需要的。

  • 使用count(*)而不是count(id). 它更适合您的问题(“行数”)。它通常是稍count(id)。而且,虽然我们可能假设idNOT NULL,但它尚未在问题中指定,所以严格来说count(id)错误的,因为 NULL 值不计入count(id)

  • 您不能GROUP BY在同一查询级别进行细分。窗口函数之前应用聚合函数,这样窗口函数count(*)每分钟只能看到 1 行。
    但是,您可以 ,SELECT DISTINCT因为窗口函数之后DISTINCT应用。

  • ORDER BY 1只是ORDER BY date_trunc('minute', "when")这里的简写。
    1是对SELECT列表中第一个表达式的位置引用。

  • 使用to_char(),如果你需要格式化的结果。喜欢:

SELECT DISTINCT
       to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
     , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM   mytable
ORDER  BY date_trunc('minute', "when");

Fastest

最快的

SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) sub
ORDER  BY 1;

Much like the above, but:

很像上面的,但是:

  • I use a subquery to aggregate and count rows per minute. This way we get 1 row per minute without DISTINCTin the outer SELECT.

  • Use sum()as window aggregate function now to add up the counts from the subquery.

  • 我使用子查询来聚合和计算每分钟的行数。这样我们每分钟得到 1 行而不DISTINCT在外层SELECT

  • sum()现在的窗口集合函数从子查询加起来计数。

I found this to be substantially faster with many rows per minute.

我发现这在每分钟多行的情况下要快得多。

Include minutes without activity

包括没有活动的分钟数

Shortest

最短的

@GabiMe asked in a commenthow to get eone row for everyminutein the time frame, including those where no event occured (no row in base table):

@GabiMe 在评论中询问如何为时间范围内的每个获取一行minute,包括没有发生事件的那些(基表中没有行):

SELECT DISTINCT
       minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER  BY 1;
  • Generate a row for every minute in the time frame between the first and the last event with generate_series()- here directly based on aggregated values from the subquery.

  • LEFT JOINto all timestamps truncated to the minute and count. NULLvalues (where no row exists) do not add to the running count.

  • 在第一个和最后一个事件之间的时间范围内每分钟生成一行generate_series()- 这里直接基于来自子查询的聚合值。

  • LEFT JOIN到所有截断到分钟和计数的时间戳。NULL值(不存在行的地方)不会添加到运行计数中。

Fastest

最快的

With CTE:

使用 CTE:

WITH cte AS (
   SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) 
SELECT m.minute
     , COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(min(minute), max(minute), interval '1 min')
   FROM   cte
   ) m(minute)
LEFT   JOIN cte USING (minute)
ORDER  BY 1;
  • Again, aggregate and count rows per minute in the first step, it omits the need for later DISTINCT.

  • Different from count(), sum()can return NULL. Default to 0with COALESCE.

  • 同样,在第一步中每分钟聚合和计算行数,它省略了以后的需要DISTINCT

  • 不同count()sum()可以返回NULL。默认为0with COALESCE

With many rows and an index on "when"this version with a subquery was fastest among a couple of variants I tested with Postgres 9.1 - 9.4:

在我用 Postgres 9.1 - 9.4 测试的几个变体中"when"这个版本有很多行和一个带有子查询的索引是最快的:

SELECT m.minute
     , COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM  (
   SELECT generate_series(date_trunc('minute', min("when"))
                        ,                      max("when")
                        , interval '1 min')
   FROM   tbl
   ) m(minute)
LEFT   JOIN (
   SELECT date_trunc('minute', "when") AS minute
        , count(*) AS minute_ct
   FROM   tbl
   GROUP  BY 1
   ) c USING (minute)
ORDER  BY 1;