postgresql 数据库内容版本控制

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6365710/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-20 23:05:05  来源:igfitidea点击:

Database Content Versioning

postgresqlversioningmvcc

提问by

I am interested in keeping a running history of every change which has happened on some tables in my database, thus being able to reconstruct historical states of the database for analysis purposes.

我有兴趣保留数据库中某些表上发生的每个更改的运行历史记录,从而能够重建数据库的历史状态以进行分析。

I am using Postgres, and this MVCC thing just seems like I should be able to exploit it for this purpose but I cannot find any documentation to support this. Can I do it? Is there a better way?

我正在使用 Postgres,这个 MVCC 的东西似乎我应该能够为此目的利用它,但我找不到任何文档来支持这一点。我可以做吗?有没有更好的办法?

Any input is appreciated!

任何输入表示赞赏!

UPD

UPD

I have marked Denis' response as the answer, because he did in fact answer whether MVCC is what I want which was the question. However, the strategy I have settled on is detailed below in case anyone finds it useful:

我已将丹尼斯的回答标记为答案,因为他确实回答了 MVCC 是否是我想要的问题。但是,我已经确定的策略在下面详细说明,以防有人觉得它有用:

The Postgres feature that does what I want: online backup/point in time recovery.

Postgres 功能可以满足我的需求:在线备份/时间点恢复。

http://www.postgresql.org/docs/8.1/static/backup-online.htmlexplains how to use this feature but essentially you can set this "write ahead log" to archive mode, take a snapshot of the database (say, before it goes live), then continually archive the WAL. You can then use log replay to recall the state of the database at any time, with the side benefit of having a warm standby if you choose (by continually replaying the new WALs on your standby server).

http://www.postgresql.org/docs/8.1/static/backup-online.html解释了如何使用此功能,但基本上您可以将此“预写日志”设置为存档模式,拍摄数据库快照(例如,在它上线之前),然后不断地归档 WAL。然后,您可以随时使用日志重放来调用数据库的状态,如果您愿意(通过在备用服务器上不断重放新的 WAL),还可以使用热备用。

Perhaps this method is not as elegant as other ways of keeping a history, since you need to actually build the database for every point in time you wish to query, however it looks extremely easy to set up and loses zero information. That means when I have the time to improve my handling of historical data, I'll have everything and will therefore be able to transform my clunky system to a more elegant system.

也许这种方法不像其他保存历史的方法那么优雅,因为您需要为要查询的每个时间点实际构建数据库,但是它看起来非常容易设置并且丢失零信息。这意味着当我有时间改进对历史数据的处理时,我将拥有一切,因此能够将我笨拙的系统转变为更优雅的系统。

One key fact that makes this so perfect is that my "valid time" is the same as my "transaction time" for the specific application- if this were not the case I would only be capturing "transaction time".

使其如此完美的一个关键事实是,我的“有效时间”与特定应用程序的“交易时间”相同——如果不是这种情况,我将只捕获“交易时间”。

Before I found out about the WAL, I was considering just taking daily snapshots or something but the large size requirement and data loss involved did not sit well with me.

在我发现 WAL 之前,我正在考虑只拍摄每日快照或其他东西,但涉及的大尺寸要求和数据丢失并不适合我。

For a quick way to get up and running without compromising my data retention from the outset, this seems like the perfect solution.

为了快速启动和运行而不从一开始就影响我的数据保留,这似乎是完美的解决方案。

采纳答案by Denis de Bernardy

I am using Postgres, and this MVCC thing just seems like I should be able to exploit it for this purpose but I cannot find any documentation to support this. Can I do it?

我正在使用 Postgres,这个 MVCC 的东西似乎我应该能够为此目的利用它,但我找不到任何文档来支持这一点。我可以做吗?

Not really. There are tools to see dead rows, because auto-vacuuming is so that will eventually be reclaimed.

并不真地。有一些工具可以查看死行,因为自动清空最终会被回收。

Is there a better way?

有没有更好的办法?

If I get your question right, you're looking into logging slowly changing dimensions.

如果我的问题是正确的,那么您正在考虑记录缓慢变化的维度

You might find this recent related thread interesting:

您可能会发现这个最近的相关主题很有趣:

Temporal database design, with a twist (live vs draft rows)

时态数据库设计,稍有不同(实时行与草稿行)

回答by Craig Ringer

Time Travel

时间旅行

PostgreSQL used tohave just this feature, and called it "Time Travel". See the old documentation.

PostgreSQL的使用刚才这个功能,并称之为“时间之旅”。请参阅旧文档

There's somewhat similar functionality in the spi contrib modulethat you might want to check out.

您可能想查看spi contrib 模块中的一些类似功能。

Composite type audit trigger

复合型审计触发器

What I usually do instead is to use triggers to log changes along with timestamps to archival tables, and query against those. If the table structure isn't going to change you can use something like:

我通常做的是使用触发器将更改和时间戳一起记录到存档表中,然后查询这些。如果表结构不会改变,您可以使用以下内容:

CREATE TABLE sometable_history(
    command_tag text not null check (command_tag IN ('INSERT','DELETE','UPDATE','TRUNCATE')),
    new_content sometable,
    change_time timestamp with time zone
);

and your versioning trigger can just insert into sometable_history(TG_OP,NEW,current_timestamp)(with a different CASEfor DELETE, where NEWis not defined).

并且您的版本控制触发器可以insert into sometable_history(TG_OP,NEW,current_timestamp)(使用不同的CASEfor DELETE, whereNEW未定义)。

hstore audit trigger

hstore 审计触发器

That gets painful if the schema changes to add new NOT NULLcolumns though. If you expect to do anything like that consider using a hstoreto archive the columns, instead of a composite type. I've already added an implementation of that on the PostgreSQL wiki already.

但是,如果架构更改以添加新NOT NULL列,那会很痛苦。如果您希望做类似的事情,请考虑使用 ahstore来存档列,而不是复合类型。我已经在 PostgreSQL wiki 上添加了一个实现

PITR

PITR

If you want to avoid impact on your master database (growing tables, etc), you can alternately use continuous archiving and point-in-time recoveryto log WAL files that can, using a recovery.conf, be replayed to any moment in time. Note that WAL files are big and they include not only the tuples you changed, but VACUUMactivity and other details. You'll want to run them through clearxlogtailsince they can have garbage data on the end if they're partial segments from an archive timeout, then you'll want to compress them heavily for long term storage.

如果您想避免对主数据库的影响(增长表等),您可以交替使用连续归档和时间点恢复来记录 WAL 文件,这些文件可以使用recovery.conf,在任何时间重播。请注意,WAL 文件很大,它们不仅包含您更改的元组,还包含VACUUM活动和其他详细信息。您将希望通过clearxlogtail运行它们,因为如果它们是存档超时的部分段,它们最后可能会有垃圾数据,那么您将需要对它们进行大量压缩以进行长期存储。

回答by gsiems

I'm not aware of any tools/products that are built for that purpose.

我不知道有任何为此目的而构建的工具/产品。

While this may not be exactly what you're asking for, you can configure Postgresql to log ddl changes. Setting the log_line_prefix parameter (try including %d, %m, and %u) and setting the log_statement parameter to ddl should give you a reasonable history of who made what ddl changes and when.

虽然这可能不是您所要求的,但您可以配置 Postgresql 以记录 ddl 更改。设置 log_line_prefix 参数(尝试包括 %d、%m 和 %u)并将 log_statement 参数设置为 ddl 应该会给你一个合理的历史记录,谁做了什么 ddl 更改以及何时更改。

Having said that, I don't believe logging ddl to be foolproof. For example, consider a situation where:

话虽如此,我不相信记录 ddl 是万无一失的。例如,考虑这样一种情况:

  1. Multiple schemas have a table with the same name,
  2. one of the tables is altered, and
  3. the ddl doesn't fully qualify the table name (relying on the search path to get it right),
  4. then it may not be possible to know from the log which table was actually altered.
  1. 多个模式有一个同名的表,
  2. 其中一张表被更改,并且
  3. ddl 不完全限定表名(依靠搜索路径使其正确),
  4. 那么可能无法从日志中知道实际更改了哪个表。

Another option might be to log ddl as above but then have a watcher program perform a pg_dump of the database schema whenever a ddl entry get's logged. You could even compare the new dump with the previous dump and extract just the objects that were changed.

另一种选择可能是像上面那样记录 ddl,然后让观察程序在记录 ddl 条目时执行数据库模式的 pg_dump。您甚至可以将新转储与先前转储进行比较,并仅提取发生更改的对象。