SQL 查询显示多个日期范围之间的差距
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9604400/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL Query to show gaps between multiple date ranges
提问by Purplegoldfish
Im working on a SSRS / SQL project and trying to write a query to get the gaps between dates and I am completely lost with how to write this.Basically we have a number of devices which can be scheduled for use and I need a report to show when they are not in use.
我正在处理一个 SSRS/SQL 项目并试图编写一个查询来获取日期之间的差距,但我完全不知道如何编写它。基本上我们有许多可以安排使用的设备,我需要一份报告不使用时显示。
I have a table with Device ID, EventStart and EventEnd times, I need to run a query to get the times between these events for each device but I am not really sure how to do this.
我有一个包含设备 ID、EventStart 和 EventEnd 时间的表,我需要运行查询以获取每个设备的这些事件之间的时间,但我不确定如何执行此操作。
For example:
例如:
Device 1 Event A runs from `01/01/2012 08:00 - 01/01/2012 10:00`
Device 1 Event B runs from `01/01/2012 18:00 - 01/01/2012 20:00`
Device 1 Event C runs from `02/01/2012 18:00 - 02/01/2012 20:00`
Device 2 Event A runs from `01/01/2012 08:00 - 01/01/2012 10:00`
Device 2 Event B runs from `01/01/2012 18:00 - 01/01/2012 20:00`
My query should have as its result
我的查询结果应该是
`Device 1 01/01/2012 10:00 - 01/01/2012 18:00`
`Device 1 01/01/2012 20:00 - 02/01/2012 18:00`
`Device 2 01/01/2012 10:00 - 01/01/2012 18:00`
There will be around 4 - 5 devices on average in this table, and maybe 200 - 300 + events.
该表中平均大约有 4 - 5 个设备,可能有 200 - 300 多个事件。
Updates:
更新:
Ok I'll update this to try give a bit more info since I dont seem to have explained this too well (sorry!)
好的,我会更新它以尝试提供更多信息,因为我似乎没有很好地解释这一点(对不起!)
What I am dealing with is a table which has details for Events, Each event is a booking of a flight simulator, We have a number of flight sims( refered to as devices in the table) and we are trying to generate a SSRS report which we can give to a customer to show the days / times each sim is available.
我正在处理的是一个包含事件详细信息的表,每个事件都是一个飞行模拟器的预订,我们有许多飞行模拟器(在表中称为设备),我们正在尝试生成一个 SSRS 报告我们可以给客户显示每个 SIM 卡可用的天数/时间。
So I am going to pass in a start / end date parameter and select all availabilities between those dates. The results should then display as something like:
所以我将传入开始/结束日期参数并选择这些日期之间的所有可用性。结果应显示如下:
Device Available_From Available_To
1 01/01/2012 10:00 01/01/2012 18:00`
1 01/01/2012 20:00 02/01/2012 18:00`
2 01/01/2012 10:00 01/01/2012 18:00`
Also Events can sometimes overlap though this is very rare and due to bad data, it doesnt matter about an event on one device overlapping an event on a different device as I need to know availability for each device seperately.
此外,事件有时可能会重叠,尽管这是非常罕见的,并且由于数据错误,一个设备上的事件与不同设备上的事件重叠并不重要,因为我需要分别了解每个设备的可用性。
回答by Branko Dimitrijevic
The Query:
查询:
Assuming the fields containing the interval are named Start
and Finish
, and the table is named YOUR_TABLE
, the query...
假设包含区间的字段被命名为Start
and Finish
,并且表被命名为YOUR_TABLE
,查询...
SELECT Finish, Start
FROM
(
SELECT DISTINCT Start, ROW_NUMBER() OVER (ORDER BY Start) RN
FROM YOUR_TABLE T1
WHERE
NOT EXISTS (
SELECT *
FROM YOUR_TABLE T2
WHERE T1.Start > T2.Start AND T1.Start < T2.Finish
)
) T1
JOIN (
SELECT DISTINCT Finish, ROW_NUMBER() OVER (ORDER BY Finish) RN
FROM YOUR_TABLE T1
WHERE
NOT EXISTS (
SELECT *
FROM YOUR_TABLE T2
WHERE T1.Finish > T2.Start AND T1.Finish < T2.Finish
)
) T2
ON T1.RN - 1 = T2.RN
WHERE
Finish < Start
...gives the following result on your test data:
...对您的测试数据给出以下结果:
Finish Start
2012-01-01 10:00:00.000 2012-01-01 18:00:00.000
The important property of this query is that it would work on overlappingintervals as well.
此查询的重要属性是它也适用于重叠间隔。
The Algorithm:
算法:
1. Merge Overlapping Intervals
1. 合并重叠区间
The subquery T1
accepts only those interval starts that are outside other intervals. The subquery T2
does the same for interval ends. This is what removes overlaps.
子查询T1
只接受那些在其他区间之外的区间开始。子查询T2
对间隔结束执行相同的操作。这就是消除重叠的原因。
The DISTINCT
is important in case there are two identical interval starts (or ends) that are bothoutside other intervals. The WHERE Finish < Start
simply eliminates any empty intervals (i.e. duration 0).
DISTINCT
如果有两个相同的间隔开始(或结束)都在其他间隔之外,这很重要。在WHERE Finish < Start
简单地消除了任何空的时间间隔(即,持续时间0)。
We also attach a row number relative to temporal ordering, which will be needed in the next step.
我们还附加了一个与时间顺序相关的行号,这将在下一步中用到。
The T1
yields:
该T1
收益率:
Start RN
2012-01-01 08:00:00.000 1
2012-01-01 18:00:00.000 2
The T2
yields:
该T2
收益率:
Finish RN
2012-01-01 10:00:00.000 1
2012-01-01 20:00:00.000 2
2. Reconstruct the Result
2. 重构结果
We can now reconstruct either the "active" or the "inactive" intervals.
我们现在可以重建“活动”或“非活动”间隔。
The inactiveintervals are reconstructed by putting together end of the previousinterval with the beginning of the next one, hence - 1
in the ON
clause. Effectively, we put...
所述非活动间隔由放在一起的的结束重建前与下一个的开始的时间间隔,因此- 1
中ON
子句。实际上,我们把...
Finish RN
2012-01-01 10:00:00.000 1
...and...
...和...
Start RN
2012-01-01 18:00:00.000 2
...together, resulting in:
...一起,导致:
Finish Start
2012-01-01 10:00:00.000 2012-01-01 18:00:00.000
(The active intervals could be reconstructed by putting rows from T1
alongside rows from T2
, by using JOIN ... ON T1.RN = T2.RN
and reverting WHERE
.)
(可以通过使用和还原将来自 的T1
行与行并排放置来重建活动间隔。)T2
JOIN ... ON T1.RN = T2.RN
WHERE
The Example:
这个例子:
Here is a slightly more realistic example. The following test data:
这是一个稍微更现实的例子。以下测试数据:
Device Event Start Finish
Device 1 Event A 2012-01-01 08:00:00.000 2012-01-01 10:00:00.000
Device 2 Event B 2012-01-01 18:00:00.000 2012-01-01 20:00:00.000
Device 3 Event C 2012-01-02 11:00:00.000 2012-01-02 15:00:00.000
Device 4 Event D 2012-01-02 10:00:00.000 2012-01-02 12:00:00.000
Device 5 Event E 2012-01-02 10:00:00.000 2012-01-02 15:00:00.000
Device 6 Event F 2012-01-03 09:00:00.000 2012-01-03 10:00:00.000
Gives the following result:
给出以下结果:
Finish Start
2012-01-01 10:00:00.000 2012-01-01 18:00:00.000
2012-01-01 20:00:00.000 2012-01-02 10:00:00.000
2012-01-02 15:00:00.000 2012-01-03 09:00:00.000
回答by Mike Ryan
First Answer -- but see below for final one with additional constraints added by OP.
第一个答案——但请参阅下面的最后一个答案,其中 OP 添加了额外的约束。
-- If you want to get the next startTime after the most recent endTime and avoid overlaps, you want something like:
-- 如果你想在最近的 endTime 之后获得下一个 startTime 并避免重叠,你需要类似的东西:
select
distinct
e1.deviceId,
e1.EventEnd,
e3.EventStart
from Events e1
join Events e3 on e1.eventEnd < e3.eventStart /* Finds the next start Time */
and e3.eventStart = (select min(eventStart) from Events e5
where e5.eventStart > e1.eventEnd)
and not exists (select * /* Eliminates an e1 rows if it is overlapped */
from Events e5
where e5.eventStart < e1.eventEnd
and e5.eventEnd > e1.eventEnd)
For the case of your three rows:
对于三行的情况:
INSERT INTO Events VALUES (1, '01/01/2012 08:00', '01/01/2012 10:00')
INSERT INTO Events VALUES (2, '01/01/2012 18:00', '01/01/2012 20:00')
insert into Events values (2, '01/01/2012 09:00', '01/01/2012 11:00')
This gives 1 result:
这给出了 1 个结果:
January, 01 2012 11:00:00-0800 January, 01 2012 18:00:00-0800
However, I assume you probably want to match on DeviceId also. In which case, on the joins, you'd add e1.DeviceId = e3.DeviceId
and e1.deviceId = e5.deviceId
但是,我假设您可能还想在 DeviceId 上进行匹配。在这种情况下,在连接上,您将添加e1.DeviceId = e3.DeviceId
和e1.deviceId = e5.deviceId
SQL Fiddle here: http://sqlfiddle.com/#!3/3899c/8
SQL小提琴在这里:http://sqlfiddle.com/#!3/3899c/8
--
——
OK, final edit. Here's a query adding in deviceIds and adding in a distinct to account for simultenously ending events:
好的,最终编辑。这是一个添加 deviceIds 并添加一个 distinct 以说明同时结束事件的查询:
SELECT distinct
e1.DeviceID,
e1.EventEnd as LastEndTime,
e3.EventStart as NextStartTime
FROM Events e1
join Events e3 on e1.eventEnd < e3.eventStart
and e3.deviceId = e1.deviceId
and e3.eventStart = (select min(eventStart) from Events e5
where e5.eventStart > e1.eventEnd
and e5.deviceId = e3.deviceId)
where not exists (select * from Events e7
where e7.eventStart < e1.eventEnd
and e7.eventEnd > e1.eventEnd
and e7.deviceId = e1.deviceId)
order by e1.deviceId, e1.eventEnd
The join to the e3 finds the next start. The join to e5 guarantees that this is the earliest starttime after the current endtime. The join to e7 eliminates a row if the end-time of the considered row is overlapped by a different row.
连接到 e3 找到下一个开始。加入到 e5 保证这是当前结束时间之后的最早开始时间。如果所考虑行的结束时间与另一行重叠,则与 e7 的连接会消除该行。
For this data:
对于此数据:
INSERT INTO Events VALUES (1, '01/01/2012 08:00', '01/01/2012 10:00')
INSERT INTO Events VALUES (2, '01/01/2012 18:00', '01/01/2012 20:00')
insert into Events values (2, '01/01/2012 09:00', '01/01/2012 11:00')
insert into Events values (2, '01/02/2012 11:00', '01/02/2012 15:00')
insert into Events values (1, '01/02/2012 10:00', '01/02/2012 12:00')
insert into Events values (2, '01/02/2012 10:00', '01/02/2012 15:00')
insert into Events values (2, '01/03/2012 09:00', '01/03/2012 10:00')
You get this result:
你得到这个结果:
1 January, 01 2012 10:00:00-0800 January, 02 2012 10:00:00-0800
2 January, 01 2012 11:00:00-0800 January, 01 2012 18:00:00-0800
2 January, 01 2012 20:00:00-0800 January, 02 2012 10:00:00-0800
2 January, 02 2012 15:00:00-0800 January, 03 2012 09:00:00-0800
SQL Fiddle here: http://sqlfiddle.com/#!3/db0fa/3
SQL小提琴在这里:http://sqlfiddle.com/#!3/db0fa/3
回答by Michael Fredrickson
I'm going to assume that it's not really this simple... but here's a query based on my current understanding of your scenario:
我将假设它并不是那么简单......但这是基于我目前对您的场景的理解的查询:
DECLARE @Events TABLE (
DeviceID INT,
EventStart DATETIME,
EventEnd DATETIME
)
INSERT INTO @Events VALUES (1, '01/01/2012 08:00', '01/01/2012 10:00')
INSERT INTO @Events VALUES (2, '01/01/2012 18:00', '01/01/2012 20:00')
SELECT
e1.DeviceID,
e1.EventEnd,
e2.EventStart
FROM
@Events e1
JOIN @Events e2
ON e2.EventStart = (
SELECT MIN(EventStart)
FROM @Events
WHERE EventStart > e1.EventEnd
)
回答by bPratik
Does this solve your issue:
这是否解决了您的问题:
- http://www.simple-talk.com/sql/t-sql-programming/find-missing-date-ranges-in-sql/
- http://www.simple-talk.com/sql/t-sql-programming/missing-date-ranges--the-sequel/
- http://www.simple-talk.com/sql/t-sql-programming/find-missing-date-ranges-in-sql/
- http://www.simple-talk.com/sql/t-sql-programming/missing-date-ranges--the-sequel/
The second one seems more relevant
第二个似乎更相关
'There is a table, where two of the columns are DateFrom and DateTo. Both columns contain date and time values. How does one find the missing date ranges or, in other words, all the date ranges that are not covered by any of the entries in the table'.
'有一个表,其中两列是 DateFrom 和 DateTo。两列都包含日期和时间值。如何找到缺失的日期范围,或者换句话说,找到表中任何条目未涵盖的所有日期范围?
回答by Martin Taleski
Here is a Postgres solution that I just did, that does not involve stored procedures:
这是我刚刚做的一个 Postgres 解决方案,它不涉及存储过程:
SELECT minute, sum(case when dp.id is null then 0 else 1 end) as s
FROM generate_series(
'2017-12-28'::timestamp,
'2017-12-30'::timestamp,
'1 minute'::interval
) minute
left outer join device_periods as dp
on minute >= dp.start_date and minute < dp.end_date
group by minute order by minute
The generate_series function generates a table that has one row for each minute in the date range. You can change the interval to 1 second, to be more precise. It is a postgres specific function, but probably something similar exists in other engines.
generate_series 函数生成一个表,该表在日期范围内的每一分钟都有一行。您可以将间隔更改为 1 秒,以便更精确。这是 postgres 特定的功能,但其他引擎中可能存在类似的功能。
This query will give you all the minutes that are filled, and all that are blank. You can wrap this query in an outer query, that can group by hours, days or do some window function operations to get the exact output as you need it. For my purposes, I only needed to count if there are blanks or not.
此查询将为您提供所有已填写的分钟数,以及所有空白的分钟数。您可以将此查询包装在一个外部查询中,该查询可以按小时、天数或执行一些窗口函数操作来获得您需要的确切输出。为了我的目的,我只需要计算是否有空格。