SQL 查询显示多个日期范围之间的差距

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9604400/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 14:43:00  来源:igfitidea点击:

SQL Query to show gaps between multiple date ranges

sqlstored-proceduresgaps-and-islands

提问by Purplegoldfish

Im working on a SSRS / SQL project and trying to write a query to get the gaps between dates and I am completely lost with how to write this.Basically we have a number of devices which can be scheduled for use and I need a report to show when they are not in use.

我正在处理一个 SSRS/SQL 项目并试图编写一个查询来获取日期之间的差距,但我完全不知道如何编写它。基本上我们有许多可以安排使用的设备,我需要一份报告不使用时显示。

I have a table with Device ID, EventStart and EventEnd times, I need to run a query to get the times between these events for each device but I am not really sure how to do this.

我有一个包含设备 ID、EventStart 和 EventEnd 时间的表,我需要运行查询以获取每个设备的这些事件之间的时间,但我不确定如何执行此操作。

For example:

例如:

Device 1 Event A runs from `01/01/2012 08:00 - 01/01/2012 10:00`
Device 1 Event B runs from `01/01/2012 18:00 - 01/01/2012 20:00`    
Device 1 Event C runs from `02/01/2012 18:00 - 02/01/2012 20:00`    
Device 2 Event A runs from `01/01/2012 08:00 - 01/01/2012 10:00`
Device 2 Event B runs from `01/01/2012 18:00 - 01/01/2012 20:00`

My query should have as its result

我的查询结果应该是

`Device 1 01/01/2012 10:00 - 01/01/2012 18:00`
`Device 1 01/01/2012 20:00 - 02/01/2012 18:00`
`Device 2 01/01/2012 10:00 - 01/01/2012 18:00`

There will be around 4 - 5 devices on average in this table, and maybe 200 - 300 + events.

该表中平均大约有 4 - 5 个设备,可能有 200 - 300 多个事件。

Updates:

更新:

Ok I'll update this to try give a bit more info since I dont seem to have explained this too well (sorry!)

好的,我会更新它以尝试提供更多信息,因为我似乎没有很好地解释这一点(对不起!)

What I am dealing with is a table which has details for Events, Each event is a booking of a flight simulator, We have a number of flight sims( refered to as devices in the table) and we are trying to generate a SSRS report which we can give to a customer to show the days / times each sim is available.

我正在处理的是一个包含事件详细信息的表,每个事件都是一个飞行模拟器的预订,我们有许多飞行模拟器(在表中称为设备),我们正在尝试生成一个 SSRS 报告我们可以给客户显示每个 SIM 卡可用的天数/时间。

So I am going to pass in a start / end date parameter and select all availabilities between those dates. The results should then display as something like:

所以我将传入开始/结束日期参数并选择这些日期之间的所有可用性。结果应显示如下:

Device   Available_From       Available_To
 1       01/01/2012 10:00    01/01/2012 18:00`
 1       01/01/2012 20:00    02/01/2012 18:00`
 2       01/01/2012 10:00    01/01/2012 18:00`

Also Events can sometimes overlap though this is very rare and due to bad data, it doesnt matter about an event on one device overlapping an event on a different device as I need to know availability for each device seperately.

此外,事件有时可能会重叠,尽管这是非常罕见的,并且由于数据错误,一个设备上的事件与不同设备上的事件重叠并不重要,因为我需要分别了解每个设备的可用性。

回答by Branko Dimitrijevic

The Query:

查询:

Assuming the fields containing the interval are named Startand Finish, and the table is named YOUR_TABLE, the query...

假设包含区间的字段被命名为Startand Finish,并且表被命名为YOUR_TABLE,查询...

SELECT Finish, Start
FROM
    (
        SELECT DISTINCT Start, ROW_NUMBER() OVER (ORDER BY Start) RN
        FROM YOUR_TABLE T1
        WHERE
            NOT EXISTS (
                SELECT *
                FROM YOUR_TABLE T2
                WHERE T1.Start > T2.Start AND T1.Start < T2.Finish
            )
        ) T1
    JOIN (
        SELECT DISTINCT Finish, ROW_NUMBER() OVER (ORDER BY Finish) RN
        FROM YOUR_TABLE T1
        WHERE
            NOT EXISTS (
                SELECT *
                FROM YOUR_TABLE T2
                WHERE T1.Finish > T2.Start AND T1.Finish < T2.Finish
            )
    ) T2
    ON T1.RN - 1 = T2.RN
WHERE
    Finish < Start

...gives the following result on your test data:

...对您的测试数据给出以下结果:

Finish                      Start
2012-01-01 10:00:00.000     2012-01-01 18:00:00.000

The important property of this query is that it would work on overlappingintervals as well.

此查询的重要属性是它也适用于重叠间隔。



The Algorithm:

算法:

1. Merge Overlapping Intervals

1. 合并重叠区间

The subquery T1accepts only those interval starts that are outside other intervals. The subquery T2does the same for interval ends. This is what removes overlaps.

子查询T1只接受那些在其他区间之外的区间开始。子查询T2对间隔结束执行相同的操作。这就是消除重叠的原因。

The DISTINCTis important in case there are two identical interval starts (or ends) that are bothoutside other intervals. The WHERE Finish < Startsimply eliminates any empty intervals (i.e. duration 0).

DISTINCT如果有两个相同的间隔开始(或结束)在其他间隔之外,这很重要。在WHERE Finish < Start简单地消除了任何空的时间间隔(即,持续时间0)。

We also attach a row number relative to temporal ordering, which will be needed in the next step.

我们还附加了一个与时间顺序相关的行号,这将在下一步中用到。

The T1yields:

T1收益率:

Start                       RN
2012-01-01 08:00:00.000     1
2012-01-01 18:00:00.000     2

The T2yields:

T2收益率:

Finish                      RN
2012-01-01 10:00:00.000     1
2012-01-01 20:00:00.000     2

2. Reconstruct the Result

2. 重构结果

We can now reconstruct either the "active" or the "inactive" intervals.

我们现在可以重建“活动”或“非活动”间隔。

The inactiveintervals are reconstructed by putting together end of the previousinterval with the beginning of the next one, hence - 1in the ONclause. Effectively, we put...

所述非活动间隔由放在一起的的结束重建与下一个的开始的时间间隔,因此- 1ON子句。实际上,我们把...

Finish                      RN
2012-01-01 10:00:00.000     1

...and...

...和...

Start                       RN
2012-01-01 18:00:00.000     2

...together, resulting in:

...一起,导致:

Finish                      Start
2012-01-01 10:00:00.000     2012-01-01 18:00:00.000

(The active intervals could be reconstructed by putting rows from T1alongside rows from T2, by using JOIN ... ON T1.RN = T2.RNand reverting WHERE.)

(可以通过使用和还原将来自 的T1行与行并排放置来重建活动间隔。)T2JOIN ... ON T1.RN = T2.RNWHERE



The Example:

这个例子:

Here is a slightly more realistic example. The following test data:

这是一个稍微更现实的例子。以下测试数据:

Device      Event      Start                      Finish
Device 1    Event A    2012-01-01 08:00:00.000    2012-01-01 10:00:00.000
Device 2    Event B    2012-01-01 18:00:00.000    2012-01-01 20:00:00.000
Device 3    Event C    2012-01-02 11:00:00.000    2012-01-02 15:00:00.000
Device 4    Event D    2012-01-02 10:00:00.000    2012-01-02 12:00:00.000
Device 5    Event E    2012-01-02 10:00:00.000    2012-01-02 15:00:00.000
Device 6    Event F    2012-01-03 09:00:00.000    2012-01-03 10:00:00.000

Gives the following result:

给出以下结果:

Finish                      Start
2012-01-01 10:00:00.000     2012-01-01 18:00:00.000
2012-01-01 20:00:00.000     2012-01-02 10:00:00.000
2012-01-02 15:00:00.000     2012-01-03 09:00:00.000

回答by Mike Ryan

First Answer -- but see below for final one with additional constraints added by OP.

第一个答案——但请参阅下面的最后一个答案,其中 OP 添加了额外的约束。

-- If you want to get the next startTime after the most recent endTime and avoid overlaps, you want something like:

-- 如果你想在最近的 endTime 之后获得下一个 startTime 并避免重叠,你需要类似的东西:

select
    distinct
    e1.deviceId,
    e1.EventEnd,
    e3.EventStart
from Events e1 
join Events e3 on e1.eventEnd < e3.eventStart     /* Finds the next start Time */
and e3.eventStart = (select min(eventStart) from Events e5
                     where e5.eventStart > e1.eventEnd)
and not exists (select *                          /* Eliminates an e1 rows if it is overlapped */
                from Events e5 
                where e5.eventStart < e1.eventEnd
                    and e5.eventEnd > e1.eventEnd)

For the case of your three rows:

对于三行的情况:

INSERT INTO Events VALUES (1, '01/01/2012 08:00', '01/01/2012 10:00')
INSERT INTO Events VALUES (2, '01/01/2012 18:00', '01/01/2012 20:00')
insert into Events values (2, '01/01/2012 09:00', '01/01/2012 11:00')

This gives 1 result:

这给出了 1 个结果:

January, 01 2012 11:00:00-0800  January, 01 2012 18:00:00-0800

However, I assume you probably want to match on DeviceId also. In which case, on the joins, you'd add e1.DeviceId = e3.DeviceIdand e1.deviceId = e5.deviceId

但是,我假设您可能还想在 DeviceId 上进行匹配。在这种情况下,在连接上,您将添加e1.DeviceId = e3.DeviceIde1.deviceId = e5.deviceId

SQL Fiddle here: http://sqlfiddle.com/#!3/3899c/8

SQL小提琴在这里:http://sqlfiddle.com/#!3/3899c/8

--

——

OK, final edit. Here's a query adding in deviceIds and adding in a distinct to account for simultenously ending events:

好的,最终编辑。这是一个添加 deviceIds 并添加一个 distinct 以说明同时结束事件的查询:

SELECT distinct
    e1.DeviceID,
    e1.EventEnd as LastEndTime,
    e3.EventStart as NextStartTime
FROM Events e1 
join Events e3 on e1.eventEnd < e3.eventStart
     and e3.deviceId = e1.deviceId
     and e3.eventStart = (select min(eventStart) from Events e5
                     where e5.eventStart > e1.eventEnd
                    and e5.deviceId = e3.deviceId)
where not exists (select * from Events e7 
                    where e7.eventStart < e1.eventEnd
                      and e7.eventEnd > e1.eventEnd
                      and e7.deviceId = e1.deviceId)
order by e1.deviceId, e1.eventEnd

The join to the e3 finds the next start. The join to e5 guarantees that this is the earliest starttime after the current endtime. The join to e7 eliminates a row if the end-time of the considered row is overlapped by a different row.

连接到 e3 找到下一个开始。加入到 e5 保证这是当前结束时间之后的最早开始时间。如果所考虑行的结束时间与另一行重叠,则与 e7 的连接会消除该行。

For this data:

对于此数据:

INSERT INTO Events VALUES (1, '01/01/2012 08:00', '01/01/2012 10:00')
INSERT INTO Events VALUES (2, '01/01/2012 18:00', '01/01/2012 20:00')
insert into Events values (2, '01/01/2012 09:00', '01/01/2012 11:00')
insert into Events values (2, '01/02/2012 11:00', '01/02/2012 15:00')
insert into Events values (1, '01/02/2012 10:00', '01/02/2012 12:00')
insert into Events values (2, '01/02/2012 10:00', '01/02/2012 15:00')
insert into Events values (2, '01/03/2012 09:00', '01/03/2012 10:00')

You get this result:

你得到这个结果:

1   January, 01 2012 10:00:00-0800  January, 02 2012 10:00:00-0800
2   January, 01 2012 11:00:00-0800  January, 01 2012 18:00:00-0800
2   January, 01 2012 20:00:00-0800  January, 02 2012 10:00:00-0800
2   January, 02 2012 15:00:00-0800  January, 03 2012 09:00:00-0800

SQL Fiddle here: http://sqlfiddle.com/#!3/db0fa/3

SQL小提琴在这里:http://sqlfiddle.com/#!3/db0fa/3

回答by Michael Fredrickson

I'm going to assume that it's not really this simple... but here's a query based on my current understanding of your scenario:

我将假设它并不是那么简单......但这是基于我目前对您的场景的理解的查询:

DECLARE @Events TABLE (
    DeviceID INT,
    EventStart DATETIME,
    EventEnd DATETIME
)

INSERT INTO @Events VALUES (1, '01/01/2012 08:00', '01/01/2012 10:00')
INSERT INTO @Events VALUES (2, '01/01/2012 18:00', '01/01/2012 20:00')

SELECT
    e1.DeviceID,
    e1.EventEnd,
    e2.EventStart
FROM 
    @Events e1 
    JOIN @Events e2 
        ON e2.EventStart = (
            SELECT MIN(EventStart)
            FROM @Events
            WHERE EventStart > e1.EventEnd
        )

回答by bPratik

Does this solve your issue:

这是否解决了您的问题:

The second one seems more relevant

第二个似乎更相关

'There is a table, where two of the columns are DateFrom and DateTo. Both columns contain date and time values. How does one find the missing date ranges or, in other words, all the date ranges that are not covered by any of the entries in the table'.

'有一个表,其中两列是 DateFrom 和 DateTo。两列都包含日期和时间值。如何找到缺失的日期范围,或者换句话说,找到表中任何条目未涵盖的所有日期范围?

回答by Martin Taleski

Here is a Postgres solution that I just did, that does not involve stored procedures:

这是我刚刚做的一个 Postgres 解决方案,它不涉及存储过程:

SELECT minute, sum(case when dp.id is null then 0 else 1 end) as s 
FROM generate_series( 
   '2017-12-28'::timestamp,
   '2017-12-30'::timestamp,
   '1 minute'::interval
) minute 
left outer join device_periods as dp
on minute >= dp.start_date and minute < dp.end_date 
group by minute order by minute

The generate_series function generates a table that has one row for each minute in the date range. You can change the interval to 1 second, to be more precise. It is a postgres specific function, but probably something similar exists in other engines.

generate_series 函数生成一个表,该表在日期范围内的每一分钟都有一行。您可以将间隔更改为 1 秒,以便更精确。这是 postgres 特定的功能,但其他引擎中可能存在类似的功能。

This query will give you all the minutes that are filled, and all that are blank. You can wrap this query in an outer query, that can group by hours, days or do some window function operations to get the exact output as you need it. For my purposes, I only needed to count if there are blanks or not.

此查询将为您提供所有已填写的分钟数,以及所有空白的分钟数。您可以将此查询包装在一个外部查询中,该查询可以按小时、天数或执行一些窗口函数操作来获得您需要的确切输出。为了我的目的,我只需要计算是否有空格。