MySQL SELECT / GROUP BY - 时间段（10 秒、30 秒等）

Question

提问by Eric Anderson

I have a table (MySQL) that captures samples every n seconds. The table has many columns, but all that matters for this is two: a time stamp (of type TIMESTAMP) and a count (of type INT).

我有一个表（MySQL），每 n 秒捕获一次样本。该表有许多列，但所有重要的列只有两个：时间戳（TIMESTAMP 类型）和计数（INT 类型）。

What I would like to do, is get sums and averages of the count column over a range of times. For instance, I have samples every 2 seconds recorded, but I would like the sum of the count column for all the samples in a 10 second or 30 second window for all samples.

我想要做的是在一段时间内获得计数列的总和和平均值。例如，我每 2 秒记录一次样本，但我想要所有样本的 10 秒或 30 秒窗口中所有样本的计数列的总和。

Here's an example of the data:

下面是一个数据示例：

+---------------------+-----------------+
| time_stamp          | count           |
+---------------------+-----------------+
| 2010-06-15 23:35:28 |               1 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:30 |             942 |
| 2010-06-15 23:35:30 |             180 |
| 2010-06-15 23:35:30 |               4 |
| 2010-06-15 23:35:30 |              52 |
| 2010-06-15 23:35:30 |              12 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:30 |               1 |
| 2010-06-15 23:35:33 |            1468 |
| 2010-06-15 23:35:33 |             247 |
| 2010-06-15 23:35:33 |               1 |
| 2010-06-15 23:35:33 |              81 |
| 2010-06-15 23:35:33 |              16 |
| 2010-06-15 23:35:35 |            1828 |
| 2010-06-15 23:35:35 |             214 |
| 2010-06-15 23:35:35 |              75 |
| 2010-06-15 23:35:35 |               8 |
| 2010-06-15 23:35:37 |            1799 |
| 2010-06-15 23:35:37 |              24 |
| 2010-06-15 23:35:37 |              11 |
| 2010-06-15 23:35:37 |               2 |
| 2010-06-15 23:35:40 |             575 |
| 2010-06-15 23:35:40 |               1 |
| 2010-06-17 10:39:35 |               2 |
| 2010-06-17 10:39:35 |               2 |
| 2010-06-17 10:39:35 |               1 |
| 2010-06-17 10:39:35 |               2 |
| 2010-06-17 10:39:35 |               1 |
| 2010-06-17 10:39:40 |              35 |
| 2010-06-17 10:39:40 |              19 |
| 2010-06-17 10:39:40 |              37 |
| 2010-06-17 10:39:42 |              64 |
| 2010-06-17 10:39:42 |               3 |
| 2010-06-17 10:39:42 |              31 |
| 2010-06-17 10:39:42 |               7 |
| 2010-06-17 10:39:42 |             246 |
+---------------------+-----------------+

The output I would like (based on the data above) should look like this:

我想要的输出（基于上面的数据）应该是这样的：

+---------------------+-----------------+
| 2010-06-15 23:35:00 |               1 |  # This is the sum for the 00 - 30 seconds range
| 2010-06-15 23:35:30 |            7544 |  # This is the sum for the 30 - 60 seconds range
| 2010-06-17 10:39:35 |             450 |  # This is the sum for the 30 - 60 seconds range
+---------------------+-----------------+

I have used GROUP BY to gather these numbers by the second, or by the minute, but I can't seem to figure out the syntax to get the sub-minute or range of seconds GROUP BY commands to work correctly.

我已经使用 GROUP BY 按秒或按分钟收集这些数字，但我似乎无法弄清楚使 GROUP BY 命令正常工作的子分钟或秒范围的语法。

I am mostly going to be using this query to syphon data from this table to another table.

我主要是使用这个查询将数据从这个表转移到另一个表。

Thanks!

谢谢！

Answer 1

回答by Hammerite

GROUP BY UNIX_TIMESTAMP(time_stamp) DIV 30

or say for some reason you wanted to group them in 20-second intervals it would be DIV 20etc. To change the boundaries between GROUP BYvalues you could use

或者说出于某种原因，您想以 20 秒的间隔对它们进行分组，DIV 20等等。要更改GROUP BY值之间的边界，您可以使用

GROUP BY (UNIX_TIMESTAMP(time_stamp) + r) DIV 30

where ris a literal nonnegative integer less than 30. So

其中r是小于 30 的文字非负整数。所以

GROUP BY (UNIX_TIMESTAMP(time_stamp) + 5) DIV 30

should give you sums between hh:mm:05 and hh:mm:35 and between hh:mm:35 and hh:mm+1:05.

应该给你 hh:mm:05 和 hh:mm:35 之间以及 hh:mm:35 和 hh:mm+1:05 之间的总和。

Answer 2

回答by mac13k

I tried Hammerite's solution in my project, but it didn't work well where there were missing samples from the series. Here's an example of the query that is supposed to select timestamp (ts), user name and average measure from metric_table and group the results by 27-minute time intervals:

我在我的项目中尝试了 Hammerite 的解决方案，但它在系列中缺少样本的情况下效果不佳。下面是一个查询示例，它应该从 metric_table 中选择时间戳 (ts)、用户名和平均度量，并按 27 分钟的时间间隔对结果进行分组：

select 
    min(ts), 
    user_name, 
    sum(measure) / 27
from metric_table 
where 
    ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00' 

group by unix_timestamp(ts) div 1620, user_name 
order by ts, user_name
;

Note: 27 minutes (in select) = 1620 seconds (in group by), 2160 minutes = 3 days (that's the time range)

注意：27 分钟（在选择中）= 1620 秒（在 group by 中），2160 分钟 = 3 天（这是时间范围）

When I ran this query against a time series where samples were irregularly recorded (in other words: for any given time stamp there was no guarantee to find measure values for all user names) the results were not stamped according to the interval (were not placed every 27 minutes). I suspect that was due to min(ts) returning a time stamp in some groups that was greater than the expected floor(ts0 + i*interval). I modified the former query to this one:

当我针对不规则记录样本的时间序列运行此查询时（换句话说：对于任何给定的时间戳，不能保证找到所有用户名的度量值），结果没有根据间隔进行标记（未放置每 27 分钟）。我怀疑这是由于 min(ts) 在某些组中返回的时间戳大于预期的 floor(ts0 + i*interval)。我将前一个查询修改为这个：

select 
    from_unixtime(unix_timestamp(ts) - unix_timestamp(ts) mod 1620) as ts1, 
    user_name, 
    sum(measure) / 27
from metric_table
where 
    ts between date_sub('2015-03-17 00:00:00', INTERVAL 2160 MINUTE) and '2015-03-17 00:00:00' 

group by ts1, user_name 
order by ts1, user_name
;

and it works fine even when the samples are missing. I think that is because once the time math is moved to select it guarantees that ts1 will align with the time steps.

即使样本丢失，它也能正常工作。我认为这是因为一旦时间数学被移动到选择它保证 ts1 将与时间步长对齐。

Answer 3

回答by Wera

Another solution.

另一种解决方案。

To average over any interval you like you can convert your dt to timestamp and group by modulo by your interval (7 seconds in the example).

要平均您喜欢的任何间隔，您可以将您的 dt 转换为时间戳，并按您的间隔（示例中为 7 秒）按模分组。

select FROM_UNIXTIME(
    UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7
) as dt, avg(1das4hrz) from `meteor-m2_msgi`
where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-13 05:02:00'
group by FROM_UNIXTIME(
    UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7);

To show how it works, I prepare a request, showing calculations.

为了展示它是如何工作的，我准备了一个显示计算的请求。

select dt_record, minute(dt_record) as mm, SECOND(dt_record) as ss,
UNIX_TIMESTAMP(dt_record) as uxt, UNIX_TIMESTAMP(dt_record) mod 7 as ux7,
FROM_UNIXTIME(
    UNIX_TIMESTAMP(dt_record) - UNIX_TIMESTAMP(dt_record) mod 7) as dtsub,
column from `yourtable` where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-13 05:02:00';

+---------------------+--------------------+
| dt                  | avg(column)        |
+---------------------+--------------------+
| 2016-11-13 04:59:43 |  25434.85714285714 |
| 2016-11-13 05:00:42 |  5700.728813559322 |
| 2016-11-13 05:01:41 |  950.1016949152543 |
| 2016-11-13 05:02:40 |  4671.220338983051 |
| 2016-11-13 05:03:39 | 25468.728813559323 |
| 2016-11-13 05:04:38 |  43883.52542372881 |
| 2016-11-13 05:05:37 | 24589.338983050846 |
+---------------------+--------------------+


+---------------------+-----+-----+------------+------+---------------------+----------+
| dt_record           | mm  | ss  | uxt        | ux7  | dtsub               | column   |
+---------------------+------+-----+------------+------+---------------------+----------+
| 2016-11-13 05:00:00 |   0 |   0 | 1479002400 |    1 | 2016-11-13 04:59:59 |    36137 |
| 2016-11-13 05:00:01 |   0 |   1 | 1479002401 |    2 | 2016-11-13 04:59:59 |    36137 |
| 2016-11-13 05:00:02 |   0 |   2 | 1479002402 |    3 | 2016-11-13 04:59:59 |    36137 |
| 2016-11-13 05:00:03 |   0 |   3 | 1479002403 |    4 | 2016-11-13 04:59:59 |    34911 |     
| 2016-11-13 05:00:04 |   0 |   4 | 1479002404 |    5 | 2016-11-13 04:59:59 |    34911 |
| 2016-11-13 05:00:05 |   0 |   5 | 1479002405 |    6 | 2016-11-13 04:59:59 |    34911 |
| 2016-11-13 05:00:06 |   0 |   6 | 1479002406 |    0 | 2016-11-13 05:00:06 |    33726 |
| 2016-11-13 05:00:07 |   0 |   7 | 1479002407 |    1 | 2016-11-13 05:00:06 |    32581 |
| 2016-11-13 05:00:08 |   0 |   8 | 1479002408 |    2 | 2016-11-13 05:00:06 |    32581 |
| 2016-11-13 05:00:09 |   0 |   9 | 1479002409 |    3 | 2016-11-13 05:00:06 |    31475 |
+---------------------+-----+-----+------------+------+---------------------+----------+

Can anyone suggest something faster?

任何人都可以提出更快的建议吗？

Answer 4

回答by Wera

Very strange but using the solution here:

很奇怪，但在这里使用解决方案：

Average of data for every 5 minutes in the given times

给定时间内每 5 分钟的平均数据

We can suggest something like:

我们可以提出以下建议：

select convert(
              (min(dt_record) div 50)*50 - 20*((convert(min(dt_record), 
               datetime) div 50) mod 2), datetime)  as dt, 
       avg(1das4hrz) 
from `meteor-m2_msgi`
where dt_record>='2016-11-13 05:00:00'
       and dt_record < '2016-11-14 00:00:00' 
group by convert(dt_record, datetime) div 50;


select (
convert(
min(dt_record), datetime) div 50)*50 - 20*(
(convert(min(dt_record), datetime) div 50) mod 2
) as dt,
avg(column) from `your_table`
where dt_record>='2016-11-13 05:00:00'
and dt_record < '2016-11-14 00:00:00'
group by convert(dt_record, datetime) div 50;

50 is because 1/2 of NORMALminute has 30 seconds while 'INTEGER DATE FORMAT' suppose us to divide by 50

图50是因为1/2师范大学分钟提供30秒，而“INTEGER DATE FORMAT” 50假设我们分

MySQL SELECT / GROUP BY - 时间段（10 秒、30 秒等）

提问by Eric Anderson

回答by Hammerite

回答by mac13k

回答by Wera

回答by Wera

相关推荐

最近更新

标签

MySQL SELECT / GROUP BY - 时间段（10 秒、30 秒等）

提问by Eric Anderson

回答by Hammerite

回答by mac13k

回答by Wera

回答by Wera

相关推荐

Mysql 使用 MATCH() AGAINST() 搜索字符串和数字

我在哪里可以找到 phpMyAdmin 中的 MySQL 日志？

MySQL - 整数列的大小限制

MySQL “关联表”的正确名称是什么（多对多关系）

相关推荐

最近更新

标签