postgresql 在 Postgres 中将时间戳截断为 5 分钟的最快方法是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7299342/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the fastest way to truncate timestamps to 5 minutes in Postgres?
提问by DNS
Postgres can round (truncate) timestamps using the date_trunc function, like this:
Postgres 可以使用 date_trunc 函数舍入(截断)时间戳,如下所示:
date_trunc('hour', val)
date_trunc('minute', val)
I'm looking for a way to truncate a timestamp to the nearest 5-minute boundary so, for example, 14:26:57 becomes 14:25:00. The straightforward way to do it is like this:
我正在寻找一种将时间戳截断到最近的 5 分钟边界的方法,例如,14:26:57 变为 14:25:00。这样做的直接方法是这样的:
date_trunc('hour', val) + date_part('minute', val)::int / 5 * interval '5 min'
Since this is a performance-critical part of the query, I'm wondering whether this is the fastest solution, or whether there's some shortcut (compatible with Postgres 8.1+) that I've overlooked.
由于这是查询的性能关键部分,我想知道这是否是最快的解决方案,或者是否有一些我忽略的快捷方式(与 Postgres 8.1+ 兼容)。
采纳答案by a_horse_with_no_name
I don't think there is any quicker method.
我不认为有任何更快的方法。
And I don't think you should be worried about the performance of the expression.
而且我认为您不应该担心表达式的性能。
Everything else that is involved in executing your (SELECT, UPDATE, ...) statement is most probably a lot more expensive (e.g. the I/O to retrieve rows) than that date/time calculation.
执行 (SELECT, UPDATE, ...) 语句所涉及的所有其他内容很可能比日期/时间计算要昂贵得多(例如检索行的 I/O)。
回答by André C. Andersen
I was wondering the same thing. I found two alternative ways of doing this, but the one you suggested was faster.
我想知道同样的事情。我找到了两种替代方法,但您建议的方法更快。
I informally benchmarked against one of our larger tables. I limited the query to the first 4 million rows. I alternated between the two queries in order to avoid giving one a unfair advantage due to db caching.
我非正式地对我们的一张大表进行了基准测试。我将查询限制在前 400 万行。我在两个查询之间交替,以避免由于数据库缓存而给一个不公平的优势。
Going through epoch/unix time
经历纪元/Unix 时间
SELECT to_timestamp(
floor(EXTRACT(epoch FROM ht.time) / EXTRACT(epoch FROM interval '5 min'))
* EXTRACT(epoch FROM interval '5 min')
) FROM huge_table AS ht LIMIT 4000000
(Note this produces timestamptz
even if you used a time zone unaware datatype)
(请注意,timestamptz
即使您使用了不知道时区的数据类型,这也会产生)
Results
结果
- Run 1: 39.368 seconds
- Run 3: 39.526 seconds
- Run 5: 39.883 seconds
- 运行 1:39.368 秒
- 运行 3:39.526 秒
- 运行 5:39.883 秒
Using date_trunc and date_part
使用 date_trunc 和 date_part
SELECT
date_trunc('hour', ht.time)
+ date_part('minute', ht.time)::int / 5 * interval '5 min'
FROM huge_table AS ht LIMIT 4000000
Results
结果
- Run 2: 34.189 seconds
- Run 4: 37.028 seconds
- Run 6: 32.397 seconds
- 运行 2:34.189 秒
- 运行 4:37.028 秒
- 运行 6:32.397 秒
System
系统
- DB version: PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
- Cores: Intel? Xeon?, E5-1650v2, Hexa-Core
- RAM: 64 GB, DDR3 ECC RAM
- DB 版本:PostgreSQL 9.6.2 on x86_64-pc-linux-gnu,由 gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2 编译,64 位
- 核心:英特尔?至强?, E5-1650v2, 六核
- 内存:64 GB,DDR3 ECC 内存
Conclusion
结论
Your version seems to be faster. But not fast enough for my specific use case. The advantage of not having to specify the hour makes the epoch version more versatile and produces simpler parameterization in client side code. It handles 2 hour
intervals just as well as 5 minute
intervals without having to bump the date_trunc
time unit argument up. On a end note, I wish this time unit argument was changed to a time interval argument instead.
你的版本似乎更快。但对于我的特定用例来说还不够快。不必指定小时的优势使 epoch 版本更加通用,并在客户端代码中生成更简单的参数化。它2 hour
可以像处理间隔一样处理间隔,5 minute
而不必增加date_trunc
时间单位参数。最后,我希望这个时间单位参数改为时间间隔参数。
回答by Benjamin Crouzier
Full query for those wondering (based on @DNS question):
对那些想知道的人的完整查询(基于@DNS 问题):
Assuming you have orders and you want to count them by slices of 5min and shop_id:
假设您有订单并且您想按 5 分钟和 shop_id 的切片来计算它们:
SELECT date_trunc('hour', created_at) + date_part('minute', created_at)::int / 5 * interval '5 min' AS minute
, shop_id, count(id) as orders_count
FROM orders
GROUP BY 1, shop_id
ORDER BY 1 ASC