我应该在 PostgreSQL 数据库中选择哪种时间戳类型?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6151084/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 22:57:50  来源:igfitidea点击:

Which timestamp type should I choose in a PostgreSQL database?

postgresqltimezonetimestamp

提问by Jerome WAGNER

I would like to define a best practice for storing timestamps in my Postgres database in the context of a multi-timezone project.

我想定义在多时区项目的上下文中将时间戳存储在我的 Postgres 数据库中的最佳实践。

I can

我可以

  1. choose TIMESTAMP WITHOUT TIME ZONEand remember which timezone was used at insertion time for this field
  2. choose TIMESTAMP WITHOUT TIME ZONEand add another field which will contain the name of the timezone that was used at insertion time
  3. choose TIMESTAMP WITH TIME ZONEand insert the timestamps accordingly
  1. 选择TIMESTAMP WITHOUT TIME ZONE并记住在此字段插入时使用的时区
  2. 选择TIMESTAMP WITHOUT TIME ZONE并添加另一个字段,该字段将包含插入时使用的时区名称
  3. 相应地选择TIMESTAMP WITH TIME ZONE并插入时间戳

I have a slight preference for option 3 (timestamp with time zone) but would like to have an educated opinion on the matter.

我对选项 3(带时区的时间戳)略有偏好,但想对此事发表有根据的意见。

回答by Sean

First off, PostgreSQL's time handling and arithmetic is fantastic and Option 3 is fine in the general case. It is, however, an incomplete view of time and timezones and can be supplemented:

首先,PostgreSQL 的时间处理和算术非常棒,选项 3 在一般情况下很好。然而,它是一个不完整的时间和时区视图,可以补充:

  1. Store the name of a user's time zone as a user preference (e.g. America/Los_Angeles, not -0700).
  2. Have user events/time data submitted local to their frame of reference (most likely an offset from UTC, such as -0700).
  3. In application, convert the time to UTCand stored using a TIMESTAMP WITH TIME ZONEcolumn.
  4. Return time requests local to a user's time zone (i.e. convert from UTCto America/Los_Angeles).
  5. Set your database's timezoneto UTC.
  1. 将用户的时区名称存储为用户首选项(例如America/Los_Angeles,不是-0700)。
  2. 将用户事件/时间数据提交到他们的参考框架本地(很可能是 UTC 的偏移量,例如-0700)。
  3. 在应用程序中,将时间转换为UTC使用TIMESTAMP WITH TIME ZONE列存储。
  4. 返回本地用户时区的时间请求(即从UTCto转换America/Los_Angeles)。
  5. 将您的数据库设置timezoneUTC.

This option doesn't always work because it can be hard to get a user's time zone and hence the hedge advice to use TIMESTAMP WITH TIME ZONEfor lightweight applications. That said, let me explain some background aspects of this this Option 4 in more detail.

此选项并不总是有效,因为很难获得用户的时区,因此很难获得TIMESTAMP WITH TIME ZONE用于轻量级应用程序的对冲建议。也就是说,让我更详细地解释这个选项 4 的一些背景方面。

Like Option 3, the reason for the WITH TIME ZONEis because the time at which something happened is an absolutemoment in time. WITHOUT TIME ZONEyields a relativetime zone. Don't ever, ever, ever mix absolute and relative TIMESTAMPs.

与选项 3 一样,原因WITH TIME ZONE是因为某事发生的时间是绝对的时刻。WITHOUT TIME ZONE产生一个相对时区。永远,永远,永远不要混合绝对和相对时间戳。

From a programmatic and consistency perspective, ensure all calculations are made using UTC as the time zone. This isn't a PostgreSQL requirement, but it helps when integrating with other programming languages or environments. Setting a CHECKon the column to make sure the write to the time stamp column has a time zone offset of 0is a defensive position that prevents a few classes of bugs (e.g. a script dumps data to a file and something else sorts the time data using a lexical sort). Again, PostgreSQL doesn't need this to do date calculations correctly or to convert between time zones (i.e. PostgreSQL is very adept at converting times between any two arbitrary time zones). To ensure data going in to the database is stored with an offset of zero:

从编程和一致性的角度来看,确保所有计算都使用 UTC 作为时区。这不是 PostgreSQL 的要求,但它有助于与其他编程语言或环境集成。设置CHECK在列,以确保在写时间戳列有一个时区的偏移0是防守的位置,防止错误的几类(如脚本转储数据存储到文件和其他东西使用排序时数据词法排序)。同样,PostgreSQL 不需要它来正确进行日期计算或在时区之间进行转换(即 PostgreSQL 非常擅长在任意两个任意时区之间转换时间)。为确保进入数据库的数据以零偏移量存储:

CREATE TABLE my_tbl (
  my_timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
  CHECK(EXTRACT(TIMEZONE FROM my_timestamp) = '0')
);
test=> SET timezone = 'America/Los_Angeles';
SET
test=> INSERT INTO my_tbl (my_timestamp) VALUES (NOW());
ERROR:  new row for relation "my_tbl" violates check constraint "my_tbl_my_timestamp_check"
test=> SET timezone = 'UTC';
SET
test=> INSERT INTO my_tbl (my_timestamp) VALUES (NOW());
INSERT 0 1

It's not 100% perfect, but it provides a strong enough anti-footshooting measure that makes sure the data is already converted to UTC. There are lots of opinions on how to do this, but this seems to be the best in practice from my experience.

它不是 100% 完美,但它提供了足够强大的反足拍措施,可确保数据已转换为 UTC。关于如何做到这一点有很多意见,但根据我的经验,这似乎是实践中最好的。

Criticisms of database time zone handling is largely justified (there are plenty of databases that handle this with great incompetence), however PostgreSQL's handling of timestamps and timezones is pretty awesome (despite a few "features" here and there). For example, one such feature:

对数据库时区处理的批评在很大程度上是有道理的(有很多数据库在处理这个问题时非常无能),但是 PostgreSQL 对时间戳和时区的处理非常棒(尽管这里和那里有一些“功能”)。例如,一个这样的功能:

-- Make sure we're all working off of the same local time zone
test=> SET timezone = 'America/Los_Angeles';
SET
test=> SELECT NOW();
              now              
-------------------------------
 2011-05-27 15:47:58.138995-07
(1 row)

test=> SELECT NOW() AT TIME ZONE 'UTC';
          timezone          
----------------------------
 2011-05-27 22:48:02.235541
(1 row)

Note that AT TIME ZONE 'UTC'strips time zone info and creates a relative TIMESTAMP WITHOUT TIME ZONEusing your target's frame of reference (UTC).

请注意,AT TIME ZONE 'UTC'剥离时区信息并TIMESTAMP WITHOUT TIME ZONE使用目标的参考系 ( UTC)创建一个相对值。

When converting from an incomplete TIMESTAMP WITHOUT TIME ZONEto a TIMESTAMP WITH TIME ZONE, the missing time zone is inherited from your connection:

从不完整TIMESTAMP WITHOUT TIME ZONE转换为 时TIMESTAMP WITH TIME ZONE,缺少的时区是从您的连接继承的:

test=> SET timezone = 'America/Los_Angeles';
SET
test=> SELECT EXTRACT(TIMEZONE_HOUR FROM NOW());
 date_part 
-----------
        -7
(1 row)
test=> SELECT EXTRACT(TIMEZONE_HOUR FROM TIMESTAMP WITH TIME ZONE '2011-05-27 22:48:02.235541');
 date_part 
-----------
        -7
(1 row)

-- Now change to UTC    
test=> SET timezone = 'UTC';
SET
-- Create an absolute time with timezone offset:
test=> SELECT NOW();
              now              
-------------------------------
 2011-05-27 22:48:40.540119+00
(1 row)

-- Creates a relative time in a given frame of reference (i.e. no offset)
test=> SELECT NOW() AT TIME ZONE 'UTC';
          timezone          
----------------------------
 2011-05-27 22:48:49.444446
(1 row)

test=> SELECT EXTRACT(TIMEZONE_HOUR FROM NOW());
 date_part 
-----------
         0
(1 row)

test=> SELECT EXTRACT(TIMEZONE_HOUR FROM TIMESTAMP WITH TIME ZONE '2011-05-27 22:48:02.235541');
 date_part 
-----------
         0
(1 row)

The bottom line:

底线:

  • store a user's time zone as a named label (e.g. America/Los_Angeles) and not an offset from UTC (e.g. -0700)
  • use UTC for everything unless there is a compelling reason to store a non-zero offset
  • treat all non-zero UTC times as an input error
  • never mix and match relative and absolute timestamps
  • also use UTCas the timezonein the database if possible
  • 将用户的时区存储为命名标签(例如America/Los_Angeles)而不是 UTC 的偏移量(例如-0700
  • 除非有令人信服的理由来存储非零偏移量,否则一切都使用 UTC
  • 将所有非零 UTC 时间视为输入错误
  • 永远不要混合和匹配相对和绝对时间戳
  • 还使用UTC作为timezone中如果可能的话数据库

Random programming language note: Python's datetimedata type is very good at maintaining the distinction between absolute vs relative times (albeit frustrating at first until you supplement it with a library like PyTZ).

随机编程语言说明:Python 的datetime数据类型非常擅长区分绝对时间和相对时间(尽管一开始会令人沮丧,直到您使用PyTZ 之类的库对其进行补充)。



EDIT

编辑

Let me explain the difference between relative vs absolute a bit more.

让我再解释一下相对与绝对之间的区别。

Absolute time is used to record an event. Examples: "User 123 logged in" or "a graduation ceremonies start at 2011-05-28 2pm PST." Regardless of your local time zone, if you could teleport to where the event occurred, you could witness the event happening. Most time data in a database is absolute (and therefore should be TIMESTAMP WITH TIME ZONE, ideally with a +0 offset and a textual label representing the rules governing the particular timezone - not an offset).

绝对时间用于记录事件。示例:“用户 123 已登录”或“毕业典礼于太平洋标准时间 2011-05-28 下午 2 点开始”。无论您所在的时区如何,如果您可以传送到事件发生的地方,您就可以见证事件的发生。数据库中的大多数时间数据都是绝对的(因此应该是TIMESTAMP WITH TIME ZONE,理想情况下应该带有 +0 偏移量和表示管理特定时区的规则的文本标签 - 而不是偏移量)。

A relative event would be to record or schedule the time of something from the perspective of a yet-to-be-determined time zone. Examples: "our business's doors open at 8am and close at 9pm", "let's meet every Monday at 7am for a weekly breakfast meeting," or "every Halloween at 8pm." In general, relative time is used in a template or factory for events, and absolute time is used for almost everything else. There is one rare exception that's worth pointing out which should illustrate the value of relative times. For future events that are far enough in the future where there could be uncertainty about the absolute time at which something could occur, use a relative timestamp. Here's a real world example:

相对事件是从尚未确定的时区的角度记录或安排某事的时间。例如:“我们公司的大门在早上 8 点开门,晚上 9 点关门”、“让我们每周一早上 7 点见面,参加每周早餐会”或“每个万圣节晚上 8 点”。一般来说,相对时间用于事件的模板或工厂,而绝对时间用于几乎所有其他事情。有一个罕见的例外值得指出,它应该说明相对时间的价值。对于未来足够远的未来事件,可能不确定某事发生的绝对时间,请使用相对时间戳。这是一个真实世界的例子:

Suppose it's the year 2004 and you need to schedule a delivery on October 31st in 2008 at 1pm on the West Coast of the US (i.e. America/Los_Angeles/PST8PDT). If you stored that using absolute time using '2008-10-31 21:00:00.000000+00'::TIMESTAMP WITH TIME ZONE, the delivery would have shown up at 2pm because the US Government passed the Energy Policy Act of 2005that changed the rules governing daylight savings time. In 2004 when the delivery was scheduled, the date 10-31-2008would have been Pacific Standard Time (+8000), but starting in year 2005+ timezone databases recognized that 10-31-2008would have been Pacific Daylight Savings time (+0700). Storing a relative timestamp with the time zone would have resulted in a correct delivery schedule because a relative timestamp is immune to Congress' ill-informed tampering. Where the cutoff between using relative vs absolute times for scheduling things is, is a fuzzy line, but my rule of thumb is that scheduling for anything in the future further than 3-6mo should make use of relative timestamps (scheduled = absolute vs planned = relative ???).

假设现在是 2004 年,您需要安排在 2008 年 10 月 31 日下午 1 点在美国西海岸(即America/Los_Angeles/ PST8PDT)交货。如果您使用绝对时间存储该数据'2008-10-31 21:00:00.000000+00'::TIMESTAMP WITH TIME ZONE,则交付将在下午 2 点显示,因为美国政府通过了2005 年能源政策法案,法案更改了管理夏令时的规则。在 2004 年计划交付时,日期10-31-2008将是太平洋标准时间 ( +8000),但从 2005 年开始,时区数据库认识到这10-31-2008将是太平洋夏令时 (+0700)。使用时区存储相对时间戳会导致正确的交付时间表,因为相对时间戳不受国会不知情篡改的影响。使用相对时间和绝对时间进行调度之间的界限是模糊的,但我的经验法则是,未来 3-6 个月之后的任何事情的调度都应该使用相对时间戳(已调度 = 绝对 vs 计划 =相对的 ???)。

The other/last type of relative time is the INTERVAL. Example: "the session will time out 20 minutes after a user logs in". An INTERVALcan be used correctly with either absolute timestamps (TIMESTAMP WITH TIME ZONE) or relative timestamps (TIMESTAMP WITHOUT TIME ZONE). It is equally correct to say, "a user session expires 20min after a successful login (login_utc + session_duration)" or "our morning breakfast meeting can only last 60 minutes (recurring_start_time + meeting_length)".

另一种/最后一种类型的相对时间是INTERVAL. 示例:“会话将在用户登录后 20 分钟超时”。AnINTERVAL可以与绝对时间戳 ( TIMESTAMP WITH TIME ZONE) 或相对时间戳 ( TIMESTAMP WITHOUT TIME ZONE)一起正确使用。同样正确的说法是“用户会话在成功登录后 20 分钟到期(login_utc + session_duration)”或“我们的早餐会议只能持续 60 分钟(recurring_start_time + meeting_length)”。

Last bits of confusion: DATE, TIME, TIME WITHOUT TIME ZONEand TIME WITH TIME ZONEare all relative data types. For example: '2011-05-28'::DATErepresents a relative date since you have no time zone information which could be used to identify midnight. Similarly, '23:23:59'::TIMEis relative because you don't know either the time zone or the DATErepresented by the time. Even with '23:59:59-07'::TIME WITH TIME ZONE, you don't know what the DATEwould be. And lastly, DATEwith a time zone is not in fact a DATE, it is a TIMESTAMP WITH TIME ZONE:

混乱的最后位:DATETIMETIME WITHOUT TIME ZONETIME WITH TIME ZONE都是相对的数据类型。例如:'2011-05-28'::DATE表示相对日期,因为您没有可用于标识午夜的时区信息。同样,'23:23:59'::TIME是相对的,因为您既不知道时区,也不知道时间所DATE代表的时间。即使有'23:59:59-07'::TIME WITH TIME ZONE,你也不知道那DATE会是什么。最后,DATE时区实际上不是 a DATE,而是 a TIMESTAMP WITH TIME ZONE

test=> SET timezone = 'America/Los_Angeles';
SET
test=> SELECT '2011-05-11'::DATE AT TIME ZONE 'UTC';
      timezone       
---------------------
 2011-05-11 07:00:00
(1 row)

test=> SET timezone = 'UTC';
SET
test=> SELECT '2011-05-11'::DATE AT TIME ZONE 'UTC';
      timezone       
---------------------
 2011-05-11 00:00:00
(1 row)

Putting dates and time zones in databases is a good thing, but it is easy to get subtly incorrect results.Minimal additional effort is required to store time information correctly and completely, however that doesn't mean the extra effort is always required.

将日期和时区放入数据库是一件好事,但很容易得到微妙的错误结果。正确和完整地存储时间信息需要最少的额外工作,但这并不意味着总是需要额外的工作。

回答by Jay

Sean's answer is overly complex and misleading.

肖恩的回答过于复杂且具有误导性。

The fact is that both "WITH TIME ZONE" and "WITHOUT TIME ZONE" store the value as a unix-like absolute UTC timestamp. The difference is all in how the timestamp is displayed. When "WITH time zone" then the displayed value is the UTC stored value translated to the user's zone. When "WITHOUT time zone" the UTC stored value is twisted so as to show the same clock face no matter what zone the user has set".

事实是,“WITH TIME ZONE”和“WITHOUT TIME ZONE”都将值存储为类 Unix 的绝对 UTC 时间戳。不同之处在于时间戳的显示方式。当“WITH time zone”时,显示的值是转换为用户时区的UTC存储值。当“WITHOUT time zone”时,UTC 存储值被扭曲,以便无论用户设置什么时区都显示相同的钟面”。

The only situation where a "WITHOUT time zone" is usable is when a clock face value is applicable regardless of actual zone. For example, when a timestamp indicates when voting booths might close (ie. they close at 20:00 regardless of a person's timezone).

“WITHOUT time zone”可用的唯一情况是钟面值适用于实际区域。例如,当时间戳指示投票站可能关闭的时间时(即,无论一个人的时区如何,它们都在 20:00 关闭)。

Use choice 3. Always use "WITH time zone" unless there is a very specific reason not to.

使用选项 3。除非有非常具体的原因,否则始终使用“WITH time zone”。

回答by GordonM

My preference is towards option 3, as Postgres can then do al ot of the work recalculating timestamps relative to timezone for you, whereas with the other two you'll have to do that yourself. The extra storage overhead of storing the timestamp with a timezone is really negligible unless you're talking millions of records, in which case you probably already have pretty meaty storage requirements anyway.

我更喜欢选项 3,因为 Postgres 可以为您重新计算相对于时区的时间戳,而对于其他两个,您必须自己完成。使用时区存储时间戳的额外存储开销实际上可以忽略不计,除非您正在谈论数百万条记录,在这种情况下,无论如何您可能已经有相当多的存储需求。