在 PostgreSQL 中查找重叠的日期范围
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4480715/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find overlapping date ranges in PostgreSQL
提问by aocferreira
Is this correct?
这样对吗?
SELECT *
FROM contract
JOIN team USING (name_team)
JOIN player USING(name_player)
WHERE name_team = ?
AND DATE_PART('YEAR',date_join)>= ?
AND DATE_PART('YEAR',date_leave)<= ?
My table contract
has the player name, team name and the dates when he joined and left the club.
I want to make a function listing all players that were on the team in specific years.
The above query doesn't seem to be working ...
我的表上contract
有球员姓名、球队名称以及他加入和离开俱乐部的日期。
我想制作一个功能,列出特定年份球队中的所有球员。
上面的查询似乎不起作用......
采纳答案by Scott Marlowe
Why not use between without the date part thing:
为什么不在没有日期部分的情况下使用:
WHERE datefield BETWEEN '2009-10-10 00:00:00' AND '2009-10-11 00:00:00'
or something like that?
或类似的东西?
回答by Erwin Brandstetter
The currently accepted answerdoes not answer the question. And it is wrong in principle. a BETWEEN x AND y
translates to:
在目前接受的答案不回答这个问题。而且原则上是错误的。a BETWEEN x AND y
翻译成:
a >= x AND a <= y
Includingthe upper border, while people typically need to excludeit:
包括上边界,而人们通常需要排除它:
a >= x AND a < y
With datesyou can easily adjust. For the year 2009 use '2009-12-31' as upper border.
But it's not as simple with timestampswhich allow fractional digits. Modern Postgres versions use an 8-byte integer internally to store up to 6 fractional seconds (μs resolution). Knowing this we couldstill make it work, but that's not intuitive and depends on an implementation detail. Bad idea.
有了日期,您可以轻松调整。对于 2009 年,使用“2009-12-31”作为上边界。
但是对于允许小数位数的时间戳,它并不那么简单。现代 Postgres 版本在内部使用 8 字节整数来存储最多 6 个小数秒(μs 分辨率)。知道了这一点,我们可能仍然使它的工作,但是这并不直观,依赖于实现细节。馊主意。
Moreover, a BETWEEN x AND y
does not find overlapping ranges. We need:
此外,a BETWEEN x AND y
没有找到重叠的范围。我们需要:
b >= x AND a < y
And players that never leftare not considered, yet.
从未离开的球员不被考虑在内。
Proper answer
正确答案
Assuming the year 2009
, I'll rephrase the question without changing its meaning:
假设为 year 2009
,我将在不改变其含义的情况下重新表述这个问题:
"Find all players of a given team who joined before 2010 and did not leave before 2009."
“查找给定球队中所有在 2010 年之前加入但在 2009 年之前没有离开的球员。”
Basic query:
基本查询:
SELECT p.*
FROM team t
JOIN contract c USING (name_team)
JOIN player p USING (name_player)
WHERE t.name_team = ?
AND c.date_join < date '2010-01-01'
AND c.date_leave >= date '2009-01-01';
But there is more:
但还有更多:
If referential integrity is enforced with FK constraints, the table team
itself is just noise in the query and can be removed.
如果使用 FK 约束强制执行参照完整性,则表team
本身只是查询中的噪音,可以删除。
While the same player can leave and rejoin the same team, we also need to fold possible duplicates, for instance with DISTINCT
.
虽然同一个玩家可以离开并重新加入同一个团队,但我们还需要折叠可能的重复项,例如使用DISTINCT
.
And we mayneed to provide for a special case: players that never left. Assuming those players have NULL in date_leave
.
我们可能需要提供一种特殊情况:从未离开的球员。假设这些玩家在date_leave
.
"A player that is not known to have left is assumed to be playing for the team to this day."
“一个不知道已经离开的球员被认为是为球队效力至今。”
Refined query:
细化查询:
SELECT DISTINCT p.*
FROM contract c
JOIN player p USING (name_player)
WHERE c.name_team = ?
AND c.date_join < date '2010-01-01'
AND (c.date_leave >= date '2009-01-01' OR c.date_leave IS NULL);
Operator precedenceworks against us, AND
binds before OR
. We need parentheses.
运算符优先级对我们不利,AND
在 之前绑定OR
。我们需要括号。
Related answer with optimized DISTINCT
(if duplicates are common):
优化的相关答案DISTINCT
(如果重复很常见):
Typically, namesof natural persons aren't unique and a surrogate primary key is used. But, obviously, name_player
is the primary key of player
. If all you need is player names we don't need the table player
in the query, either:
通常,自然人的姓名不是唯一的,并且使用代理主键。但是,显然,name_player
是 的主键player
。如果您只需要玩家姓名,我们也不需要player
查询中的表,或者:
SELECT DISTINCT name_player
FROM contract
WHERE name_team = ?
AND date_join < date '2010-01-01'
AND (date_leave >= date '2009-01-01' OR date_leave IS NULL);
SQL OVERLAPS
operator
SQLOVERLAPS
运算符
OVERLAPS
automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open intervalstart <= time < end
, unlessstart
andend
are equal in which case it represents that single time instant.
OVERLAPS
自动以该对的较早值作为开始。每个时间段都被认为代表半开区间start <= time < end
,除非start
和end
相等,在这种情况下它代表单个时间瞬间。
To take care of potential NULL
values, COALESCE
seems easiest:
要处理潜在的NULL
价值,COALESCE
似乎最简单:
SELECT DISTINCT name_player
FROM contract
WHERE name_team = ?
AND (date_join, COALESCE(date_leave, CURRENT_DATE)) OVERLAPS
(date '2009-01-01', date '2010-01-01'); -- upper bound excluded
Range type with index support
带索引支持的范围类型
In Postgres 9.2 or lateryou can also operate with actual range types:
在 Postgres 9.2 或更高版本中,您还可以使用实际范围类型进行操作:
SELECT DISTINCT name_player
FROM contract
WHERE name_team = ?
AND daterange(date_join, date_leave) &&
daterange '[2009-01-01,2010-01-01)'; -- upper bound excluded
Range types add some overhead and occupy more space. 2 x date
= 8 bytes; 1 x daterange
= 14 bytes on disk or 17 bytes in RAM. But in combination with the overlap operator &&
the query can be supported with a GiST index.
范围类型会增加一些开销并占用更多空间。2 x date
= 8 个字节;1 x daterange
= 磁盘上的 14 个字节或 RAM 中的 17 个字节。但是结合重叠运算符&&
,可以使用 GiST 索引支持查询。
Also, no need to special-case NULL values. NULL means "open range" in a range type - exactly what we need. The table definition doesn't even have to change: we can create the range type on the fly - and support the query with a matching expression index:
此外,不需要特殊情况下的 NULL 值。NULL 表示范围类型中的“开放范围”——正是我们所需要的。表定义甚至不必更改:我们可以动态创建范围类型 - 并支持具有匹配表达式索引的查询:
CREATE INDEX mv_stock_dr_idx ON mv_stock USING gist (daterange(date_join, date_leave));
Related:
有关的: