在 SQL 中查找时间间隔重叠的行的简单有效方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/117962/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 23:32:13  来源:igfitidea点击:

What is a simple and efficient way to find rows with time-interval overlaps in SQL?

sql

提问by Steven A. Lowe

I have two tables, both with start time and end time fields. I need to find, for each row in the first table, all of the rows in the second table where the time intervals intersect.

我有两个表,都有开始时间和结束时间字段。我需要为第一个表中的每一行找到第二个表中时间间隔相交的所有行。

For example:

例如:

           <-----row 1 interval------->
<---find this--> <--and this--> <--and this-->

Please phrase your answer in the form of a SQL WHERE-clause, AND consider the case where the end time in the second table may be NULL.

请以 SQLWHERE子句的形式表达您的答案,并考虑第二个表中的结束时间可能是 的情况NULL

Target platform is SQL Server 2005, but solutions from other platforms may be of interest also.

目标平台是 SQL Server 2005,但其他平台的解决方案也可能引起关注。

回答by Khoth

SELECT * 
FROM table1,table2 
WHERE table2.start <= table1.end 
AND (table2.end IS NULL OR table2.end >= table1.start)

回答by Lukasz Szozda

"solutions from other platforms may be of interest also."

“来自其他平台的解决方案也可能令人感兴趣。”

SQL Standard defines OVERLAPSpredicate:

SQL 标准定义了OVERLAPS谓词:

Specify a test for an overlap between two events.

<overlaps predicate> ::=
   <row value constructor 1>  OVERLAPS <row value constructor 2> 

为两个事件之间的重叠指定测试。

<overlaps predicate> ::=
   <row value constructor 1>  OVERLAPS <row value constructor 2> 

Example:

例子:

SELECT 1
WHERE ('2020-03-01'::DATE, '2020-04-15'::DATE) OVERLAPS 
      ('2020-02-01'::DATE, '2020-03-15'::DATE) 
-- 1

db<>fiddle demo

db<>小提琴演示

回答by Casper

select * from table_1 
right join 
table_2 on 
(
table_1.start between table_2.start and table_2.[end]
or
table_1.[end] between table_2.start and table_2.[end]
or
(table_1.[end] > table_2.start and table_2.[end] is null)
)

EDIT:Ok, don't go for my solution, it perfoms like shit. The "where" solution is 14x faster. Oops...

编辑:好的,不要选择我的解决方案,它的表现就像狗屎一样。“where”解决方案的速度提高了 14 倍。哎呀...

Some statistics: running on a db with ~ 65000 records for both table 1 and 2 (no indexing), having intervals of 2 days between start and end for each row, running for 2 minutes in SQLSMSE (don't have the patience to wait)

一些统计数据:在表 1 和表 2 的约 65000 条记录的数据库上运行(无索引),每行开始和结束之间的间隔为 2 天,在 SQLSMSE 中运行 2 分钟(没有耐心等待)

Using join: 8356 rows in 2 minutes

使用连接:2 分钟内 8356 行

Using where: 115436 rows in 2 minutes

使用 where:2 分钟内 115436 行

回答by Mike S

It's sound very complicated until you start working from reverse. Below I illustrated ONLY GOOD CASES (no overlaps)! defined by those 2 simple conditions, we have no overlap ranges if condA OR condB is TRUE, so we going to reverse those: NOT condA AND NOT CondB, in our case I just reversed signs (> became <=)

在您从反向开始工作之前,这听起来非常复杂。下面我只说明了好案例(没有重叠)!由这 2 个简单条件定义,如果 condA OR condB 为 TRUE,我们就没有重叠范围,所以我们将颠倒那些:NOT condA AND NOT CondB,在我们的例子中,我只是颠倒了符号(> 变成了 <=)

/*
|--------| A                             \___  CondA: b.ddStart >  a.ddEnd
            |=========| B                /      \____ CondB:  a.ddS >  b.ddE
                          |+++++++++| A         /
*/
--DROP TABLE ran
create table ran ( mem_nbr int, ID int, ddS date, ddE date)
insert ran values  
(100, 1,  '2012-1-1','2012-12-30'),    ----\ ovl
(100, 11, '2012-12-12','2012-12-24'),  ----/
(100, 2, '2012-12-31','2014-1-1'),
(100, 3, '2014-5-1','2014-12-14') ,

(220, 1, '2015-5-5','2015-12-14') ,    ---\ovl
(220, 22, '2014-4-1','2015-5-25') ,    ---/
(220, 3, '2016-6-1','2016-12-16')  

select  DISTINCT a.mem_nbr ,  a.* , '-' [ ], b.dds, b.dde, b.id 
FROM ran a
join ran b  on  a.mem_nbr = b.mem_nbr          -- match by mem#
               AND     a.ID <> b.ID            -- itself
                  AND     b.ddS <= a.ddE        -- NOT    b.ddS >  a.ddE       
                  AND     a.ddS <= b.ddE        -- NOT    a.ddS >  b.ddE   

回答by Sven

And what, if you want to analyse such an overlap on a minute precision with 70m+ rows? the only solution i could make up myself was a time dimension table for the join

那么,如果您想以 70m+ 行的微小精度分析这种重叠呢?我可以自己弥补的唯一解决方案是连接的时间维度表

else the dublicate-handling became a headache .. and the processing cost where astronomical

否则重复处理就成了一个令人头疼的问题……而且处理成本是天文数字