SQL PostgreSQL:按分钟运行查询的行数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8193688/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PostgreSQL: running count of rows for a query 'by minute'
提问by GabiMe
I need to query for each minute the total count of rows up to that minute.
我需要每分钟查询到该分钟的总行数。
The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute:
到目前为止我能达到的最好成绩并不能解决问题。它返回每分钟计数,而不是每分钟的总计数:
SELECT COUNT(id) AS count
, EXTRACT(hour from "when") AS hour
, EXTRACT(minute from "when") AS minute
FROM mytable
GROUP BY hour, minute
回答by Erwin Brandstetter
Only return minutes with activity
仅返回活动时间
Shortest
最短的
SELECT DISTINCT
date_trunc('minute', "when") AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY 1;
Use
date_trunc()
, it returns exactly what you need.Don't include
id
in the query, since you want toGROUP BY
minute slices.count()
is typically used as plain aggregate function. Appending anOVER
clause makes it a window function. OmitPARTITION BY
in the window definition - you want a running count over allrows. By default, that counts from the first row to the last peer of the current row as defined byORDER BY
. I quote the manual:The default framing option is
RANGE UNBOUNDED PRECEDING
, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. WithORDER BY
, this sets the frame to be all rows from the partition start up through the current row's lastORDER BY
peer.And that happens to be exactlywhat you need.
Use
count(*)
rather thancount(id)
. It better fits your question ("count of rows"). It is generally slightly fasterthancount(id)
. And, while we might assume thatid
isNOT NULL
, it has not been specified in the question, socount(id)
is wrong, strictly speaking, because NULL values are not counted withcount(id)
.You can't
GROUP BY
minute slices at the same query level. Aggregate functions are applied beforewindow functions, the window functioncount(*)
would only see 1 row per minute this way.
You can, however,SELECT DISTINCT
, becauseDISTINCT
is applied afterwindow functions.ORDER BY 1
is just shorthand forORDER BY date_trunc('minute', "when")
here.1
is a positional reference reference to the 1st expression in theSELECT
list.Use
to_char()
if you need to format the result. Like:
使用
date_trunc()
,它返回的正是你所需要的。不要包含
id
在查询中,因为您想要GROUP BY
细分。count()
通常用作普通聚合函数。附加OVER
子句使其成为窗口函数。PARTITION BY
在窗口定义中省略- 您需要对所有行进行运行计数。默认情况下,从第一行到由 定义的当前行的最后一个同行计数ORDER BY
。我引用手册:默认的成帧选项是
RANGE UNBOUNDED PRECEDING
,与RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. 使用ORDER BY
,这会将框架设置为从分区启动到当前行的最后一个ORDER BY
对等方的所有行。而这恰好正是您所需要的。
使用
count(*)
而不是count(id)
. 它更适合您的问题(“行数”)。它通常是稍快比count(id)
。而且,虽然我们可能假设id
是NOT NULL
,但它尚未在问题中指定,所以严格来说count(id)
是错误的,因为 NULL 值不计入count(id)
。您不能
GROUP BY
在同一查询级别进行细分。在窗口函数之前应用聚合函数,这样窗口函数count(*)
每分钟只能看到 1 行。
但是,您可以 ,SELECT DISTINCT
因为在窗口函数之后DISTINCT
应用。ORDER BY 1
只是ORDER BY date_trunc('minute', "when")
这里的简写。1
是对SELECT
列表中第一个表达式的位置引用。使用
to_char()
,如果你需要格式化的结果。喜欢:
SELECT DISTINCT
to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY date_trunc('minute', "when");
Fastest
最快的
SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) sub
ORDER BY 1;
Much like the above, but:
很像上面的,但是:
I use a subquery to aggregate and count rows per minute. This way we get 1 row per minute without
DISTINCT
in the outerSELECT
.Use
sum()
as window aggregate function now to add up the counts from the subquery.
我使用子查询来聚合和计算每分钟的行数。这样我们每分钟得到 1 行而不
DISTINCT
在外层SELECT
。用
sum()
现在的窗口集合函数从子查询加起来计数。
I found this to be substantially faster with many rows per minute.
我发现这在每分钟多行的情况下要快得多。
Include minutes without activity
包括没有活动的分钟数
Shortest
最短的
@GabiMe asked in a commenthow to get eone row for everyminute
in the time frame, including those where no event occured (no row in base table):
@GabiMe 在评论中询问如何为时间范围内的每个获取一行minute
,包括没有发生事件的那些(基表中没有行):
SELECT DISTINCT
minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT generate_series(date_trunc('minute', min("when"))
, max("when")
, interval '1 min')
FROM tbl
) m(minute)
LEFT JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER BY 1;
Generate a row for every minute in the time frame between the first and the last event with
generate_series()
- here directly based on aggregated values from the subquery.LEFT JOIN
to all timestamps truncated to the minute and count.NULL
values (where no row exists) do not add to the running count.
在第一个和最后一个事件之间的时间范围内每分钟生成一行
generate_series()
- 这里直接基于来自子查询的聚合值。LEFT JOIN
到所有截断到分钟和计数的时间戳。NULL
值(不存在行的地方)不会添加到运行计数中。
Fastest
最快的
With CTE:
使用 CTE:
WITH cte AS (
SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
FROM tbl
GROUP BY 1
)
SELECT m.minute
, COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (
SELECT generate_series(min(minute), max(minute), interval '1 min')
FROM cte
) m(minute)
LEFT JOIN cte USING (minute)
ORDER BY 1;
Again, aggregate and count rows per minute in the first step, it omits the need for later
DISTINCT
.Different from
count()
,sum()
can returnNULL
. Default to0
withCOALESCE
.
同样,在第一步中每分钟聚合和计算行数,它省略了以后的需要
DISTINCT
。不同
count()
,sum()
可以返回NULL
。默认为0
withCOALESCE
。
With many rows and an index on "when"
this version with a subquery was fastest among a couple of variants I tested with Postgres 9.1 - 9.4:
在我用 Postgres 9.1 - 9.4 测试的几个变体中,"when"
这个版本有很多行和一个带有子查询的索引是最快的:
SELECT m.minute
, COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (
SELECT generate_series(date_trunc('minute', min("when"))
, max("when")
, interval '1 min')
FROM tbl
) m(minute)
LEFT JOIN (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) c USING (minute)
ORDER BY 1;