MySQL SELECT / GROUP BY - 时间段(10 秒、30 秒等)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3086386/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SELECT / GROUP BY - segments of time (10 seconds, 30 seconds, etc)
提问by Eric Anderson
I have a table (MySQL) that captures samples every n seconds. The table has many columns, but all that matters for this is two: a time stamp (of type TIMESTAMP) and a count (of type INT).
我有一个表(MySQL),每 n 秒捕获一次样本。该表有许多列,但所有重要的列只有两个:时间戳(TIMESTAMP 类型)和计数(INT 类型)。
What I would like to do, is get sums and averages of the count column over a range of times. For instance, I have samples every 2 seconds recorded, but I would like the sum of the count column for all the samples in a 10 second or 30 second window for all samples.
我想要做的是在一段时间内获得计数列的总和和平均值。例如,我每 2 秒记录一次样本,但我想要所有样本的 10 秒或 30 秒窗口中所有样本的计数列的总和。
Here's an example of the data:
下面是一个数据示例:
+---------------------+-----------------+ | time_stamp | count | +---------------------+-----------------+ | 2010-06-15 23:35:28 | 1 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:30 | 942 | | 2010-06-15 23:35:30 | 180 | | 2010-06-15 23:35:30 | 4 | | 2010-06-15 23:35:30 | 52 | | 2010-06-15 23:35:30 | 12 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:30 | 1 | | 2010-06-15 23:35:33 | 1468 | | 2010-06-15 23:35:33 | 247 | | 2010-06-15 23:35:33 | 1 | | 2010-06-15 23:35:33 | 81 | | 2010-06-15 23:35:33 | 16 | | 2010-06-15 23:35:35 | 1828 | | 2010-06-15 23:35:35 | 214 | | 2010-06-15 23:35:35 | 75 | | 2010-06-15 23:35:35 | 8 | | 2010-06-15 23:35:37 | 1799 | | 2010-06-15 23:35:37 | 24 | | 2010-06-15 23:35:37 | 11 | | 2010-06-15 23:35:37 | 2 | | 2010-06-15 23:35:40 | 575 | | 2010-06-15 23:35:40 | 1 | | 2010-06-17 10:39:35 | 2 | | 2010-06-17 10:39:35 | 2 | | 2010-06-17 10:39:35 | 1 | | 2010-06-17 10:39:35 | 2 | | 2010-06-17 10:39:35 | 1 | | 2010-06-17 10:39:40 | 35 | | 2010-06-17 10:39:40 | 19 | | 2010-06-17 10:39:40 | 37 | | 2010-06-17 10:39:42 | 64 | | 2010-06-17 10:39:42 | 3 | | 2010-06-17 10:39:42 | 31 | | 2010-06-17 10:39:42 | 7 | | 2010-06-17 10:39:42 | 246 | +---------------------+-----------------+
The output I would like (based on the data above) should look like this:
我想要的输出(基于上面的数据)应该是这样的:
+---------------------+-----------------+ | 2010-06-15 23:35:00 | 1 | # This is the sum for the 00 - 30 seconds range | 2010-06-15 23:35:30 | 7544 | # This is the sum for the 30 - 60 seconds range | 2010-06-17 10:39:35 | 450 | # This is the sum for the 30 - 60 seconds range +---------------------+-----------------+
I have used GROUP BY to gather these numbers by the second, or by the minute, but I can't seem to figure out the syntax to get the sub-minute or range of seconds GROUP BY commands to work correctly.
我已经使用 GROUP BY 按秒或按分钟收集这些数字,但我似乎无法弄清楚使 GROUP BY 命令正常工作的子分钟或秒范围的语法。
I am mostly going to be using this query to syphon data from this table to another table.
我主要是使用这个查询将数据从这个表转移到另一个表。
Thanks!
谢谢!
回答by Hammerite
GROUP BY UNIX_TIMESTAMP(time_stamp) DIV 30
GROUP BY UNIX_TIMESTAMP(time_stamp) DIV 30
or say for some reason you wanted to group them in 20-second intervals it would be DIV 20
etc. To change the boundaries between GROUP BY
values you could use
或者说出于某种原因,您想以 20 秒的间隔对它们进行分组,DIV 20
等等。要更改GROUP BY
值之间的边界,您可以使用
GROUP BY (UNIX_TIMESTAMP(time_stamp) + r) DIV 30
GROUP BY (UNIX_TIMESTAMP(time_stamp) + r) DIV 30
where r
is a literal nonnegative integer less than 30. So
其中r
是小于 30 的文字非负整数。所以
GROUP BY (UNIX_TIMESTAMP(time_stamp) + 5) DIV 30
GROUP BY (UNIX_TIMESTAMP(time_stamp) + 5) DIV 30
should give you sums between hh:mm:05 and hh:mm:35 and between hh:mm:35 and hh:mm+1:05.
应该给你 hh:mm:05 和 hh:mm:35 之间以及 hh:mm:35 和 hh:mm+1:05 之间的总和。
回答by mac13k
I tried Hammerite's solution in my project, but it didn't work well where there were missing samples from the series. Here's an example of the query that is supposed to select timestamp (ts), user name and average measure from metric_table and group the results by 27-minute time intervals:
我在我的项目中尝试了 Hammerite 的解决方案,但它在系列中缺少样本的情况下效果不佳。下面是一个查询示例,它应该从 metric_table 中选择时间戳 (ts)、用户名和平均度量,并按 27 分钟的时间间隔对结果进行分组:
select
min(ts),
user_name,
sum(measure) / 27
from metric_table
where
ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00'
group by unix_timestamp(ts) div 1620, user_name
order by ts, user_name
;
Note: 27 minutes (in select) = 1620 seconds (in group by), 2160 minutes = 3 days (that's the time range)
注意:27 分钟(在选择中)= 1620 秒(在 group by 中),2160 分钟 = 3 天(这是时间范围)
When I ran this query against a time series where samples were irregularly recorded (in other words: for any given time stamp there was no guarantee to find measure values for all user names) the results were not stamped according to the interval (were not placed every 27 minutes). I suspect that was due to min(ts) returning a time stamp in some groups that was greater than the expected floor(ts0 + i*interval). I modified the former query to this one:
当我针对不规则记录样本的时间序列运行此查询时(换句话说:对于任何给定的时间戳,不能保证找到所有用户名的度量值),结果没有根据间隔进行标记(未放置每 27 分钟)。我怀疑这是由于 min(ts) 在某些组中返回的时间戳大于预期的 floor(ts0 + i*interval)。我将前一个查询修改为这个:
select
from_unixtime(unix_timestamp(ts) - unix_timestamp(ts) mod 1620) as ts1,
user_name,
sum(measure) / 27
from metric_table
where
ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00'
group by ts1, user_name
order by ts1, user_name
;
and it works fine even when the samples are missing. I think that is because once the time math is moved to select it guarantees that ts1 will align with the time steps.
即使样本丢失,它也能正常工作。我认为这是因为一旦时间数学被移动到选择它保证 ts1 将与时间步长对齐。
回答by Wera
Another solution.
另一种解决方案。
To average over any interval you like you can convert your dt to timestamp and group by modulo by your interval (7 seconds in the example).
要平均您喜欢的任何间隔,您可以将您的 dt 转换为时间戳,并按您的间隔(示例中为 7 秒)按模分组。
select FROM_UNIXTIME(
UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7
) as dt, avg(1das4hrz) from `meteor-m2_msgi`
where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-13 05:02:00'
group by FROM_UNIXTIME(
UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7);
To show how it works, I prepare a request, showing calculations.
为了展示它是如何工作的,我准备了一个显示计算的请求。
select dt_record, minute(dt_record) as mm, SECOND(dt_record) as ss,
UNIX_TIMESTAMP(dt_record) as uxt, UNIX_TIMESTAMP(dt_record) mod 7 as ux7,
FROM_UNIXTIME(
UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7) as dtsub,
column from `yourtable` where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-13 05:02:00';
+---------------------+--------------------+
| dt | avg(column) |
+---------------------+--------------------+
| 2016-11-13 04:59:43 | 25434.85714285714 |
| 2016-11-13 05:00:42 | 5700.728813559322 |
| 2016-11-13 05:01:41 | 950.1016949152543 |
| 2016-11-13 05:02:40 | 4671.220338983051 |
| 2016-11-13 05:03:39 | 25468.728813559323 |
| 2016-11-13 05:04:38 | 43883.52542372881 |
| 2016-11-13 05:05:37 | 24589.338983050846 |
+---------------------+--------------------+
+---------------------+-----+-----+------------+------+---------------------+----------+
| dt_record | mm | ss | uxt | ux7 | dtsub | column |
+---------------------+------+-----+------------+------+---------------------+----------+
| 2016-11-13 05:00:00 | 0 | 0 | 1479002400 | 1 | 2016-11-13 04:59:59 | 36137 |
| 2016-11-13 05:00:01 | 0 | 1 | 1479002401 | 2 | 2016-11-13 04:59:59 | 36137 |
| 2016-11-13 05:00:02 | 0 | 2 | 1479002402 | 3 | 2016-11-13 04:59:59 | 36137 |
| 2016-11-13 05:00:03 | 0 | 3 | 1479002403 | 4 | 2016-11-13 04:59:59 | 34911 |
| 2016-11-13 05:00:04 | 0 | 4 | 1479002404 | 5 | 2016-11-13 04:59:59 | 34911 |
| 2016-11-13 05:00:05 | 0 | 5 | 1479002405 | 6 | 2016-11-13 04:59:59 | 34911 |
| 2016-11-13 05:00:06 | 0 | 6 | 1479002406 | 0 | 2016-11-13 05:00:06 | 33726 |
| 2016-11-13 05:00:07 | 0 | 7 | 1479002407 | 1 | 2016-11-13 05:00:06 | 32581 |
| 2016-11-13 05:00:08 | 0 | 8 | 1479002408 | 2 | 2016-11-13 05:00:06 | 32581 |
| 2016-11-13 05:00:09 | 0 | 9 | 1479002409 | 3 | 2016-11-13 05:00:06 | 31475 |
+---------------------+-----+-----+------------+------+---------------------+----------+
Can anyone suggest something faster?
任何人都可以提出更快的建议吗?
回答by Wera
Very strange but using the solution here:
很奇怪,但在这里使用解决方案:
Average of data for every 5 minutes in the given times
We can suggest something like:
我们可以提出以下建议:
select convert(
(min(dt_record) div 50)*50 - 20*((convert(min(dt_record),
datetime) div 50) mod 2), datetime) as dt,
avg(1das4hrz)
from `meteor-m2_msgi`
where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-14 00:00:00'
group by convert(dt_record, datetime) div 50;
select (
convert(
min(dt_record), datetime) div 50)*50 - 20*(
(convert(min(dt_record), datetime) div 50) mod 2
) as dt,
avg(column) from `your_table`
where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-14 00:00:00'
group by convert(dt_record, datetime) div 50;
50 is because 1/2 of NORMALminute has 30 seconds while 'INTEGER DATE FORMAT' suppose us to divide by 50
图50是因为1/2师范大学分钟提供30秒,而“INTEGER DATE FORMAT” 50假设我们分