postgresql 具有变更历史的数据库设计
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17075577/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Database Design with Change History
提问by Delos Chang
I am looking to design a database that keeps track of every set of changes so that I can refer back to them in the future. So for example:
我希望设计一个数据库来跟踪每组更改,以便我将来可以参考它们。例如:
Database A
+==========+========+==========+
| ID | Name | Property |
1 Kyle 30
If I change the row's 'property' field to 50, it should update the row to:
如果我将行的“属性”字段更改为 50,它应该将该行更新为:
1 Kyle 50
But should save the fact that the row's property was 30 at some point in time. Then if the row is again updated to be 70:
但是应该保存在某个时间点行的属性为 30 的事实。然后,如果该行再次更新为 70:
1 Kyle 70
Both facts that the row's property was 50 and 70 should be preserved, such that with some query I could retrieve:
应该保留行的属性为 50 和 70 的两个事实,以便通过一些查询我可以检索:
1 Kyle 30
1 Kyle 50
It should recognize that these were the "same entries" just at different points in time.
它应该认识到这些只是在不同时间点的“相同条目”。
Edit: This history will need to be presented to the user at some point in time so ideally, there should be an understanding of which rows belong to the same "revision cluster"
编辑:此历史记录需要在某个时间点呈现给用户,因此理想情况下,应该了解哪些行属于同一“修订集群”
What is the best way to approach the design of this database?
设计此数据库的最佳方法是什么?
采纳答案by Charles Bretana
One way is to have a MyTableNameHistory
for every table in your database, and make its schema identical to the schema of table MyTableName
, except that the Primary Key of the History table has one additional column named effectiveUtc
as a DateTime. For example, if you have a table named Employee
,
一种方法是为MyTableNameHistory
数据库中的每个表设置一个,并使其架构与 table 的架构相同MyTableName
,除了 History 表的主键有一个名为effectiveUtc
DateTime 的附加列。例如,如果您有一个名为 的表Employee
,
Create Table Employee
{
employeeId integer Primary Key Not Null,
firstName varChar(20) null,
lastName varChar(30) Not null,
HireDate smallDateTime null,
DepartmentId integer null
}
Then the History table would be
那么历史表将是
Create Table EmployeeHistory
{
employeeId integer Not Null,
effectiveUtc DateTime Not Null,
firstName varChar(20) null,
lastName varChar(30) Not null,
HireDate smallDateTime null,
DepartmentId integer null,
Primary Key (employeeId , effectiveUtc)
}
Then, you can put a trigger on Employee table, so that every time you insert, update, or delete anything in the Employee table, a new record is inserted into the EmployeeHistory table with the exact same values for all the regular fields, and current UTC datetime in the effectiveUtc column.
然后,您可以在 Employee 表上放置一个触发器,这样每次在 Employee 表中插入、更新或删除任何内容时,都会向 EmployeeHistory 表中插入一条新记录,所有常规字段的值都完全相同,并且当前EffectiveUtc 列中的 UTC 日期时间。
Then to find the values at any point in the past, you just select the record from the history table whose effectiveUtc value is the highest value prior to the asOf datetime you want the value as of.
然后要查找过去任何时间点的值,您只需从历史表中选择有效Utc 值是您想要该值的 asOf 日期时间之前的最高值的记录。
Select * from EmployeeHistory h
Where EmployeeId = @EmployeeId
And effectiveUtc =
(Select Max(effectiveUtc)
From EmployeeHistory
Where EmployeeId = h.EmployeeId
And effcetiveUtc < @AsOfUtcDate)
回答by Luke
To add onto Charles' answer, I would use an Entity-Attribute-Value modelinstead of creating a different history table for every other table in your database.
要添加Charles 的答案,我将使用实体-属性-值模型,而不是为数据库中的每个其他表创建不同的历史记录表。
Basically, you would create oneHistory
table like so:
基本上,您可以像这样创建一张History
表:
Create Table History
{
tableId varChar(64) Not Null,
recordId varChar(64) Not Null,
changedAttribute varChar(64) Not Null,
newValue varChar(64) Not Null,
effectiveUtc DateTime Not Null,
Primary Key (tableId , recordId , changedAttribute, effectiveUtc)
}
Then you would create a History
record any time you createor modifydata in one of your tables.
然后,您可以在History
任何时候创建或修改其中一个表中的数据时创建记录。
To follow your example, when you add 'Kyle' to your Employee
table, you would create two records (one for each non-id attribute), and then you would create a new record every time a property changes:
以您的示例为例,当您将“Kyle”添加到您的Employee
表时,您将创建两个记录(每个非 id 属性一个),然后在每次属性更改时创建一个新记录:
History
+==========+==========+==================+==========+==============+
| tableId | recordId | changedAttribute | newValue | effectiveUtc |
| Employee | 1 | Name | Kyle | N |
| Employee | 1 | Property | 30 | N |
| Employee | 1 | Property | 50 | N+1 |
| Employee | 1 | Property | 70 | N+2 |
Alternatively, as a_horse_with_no_namesuggested in this comment, if you don't want to store a new History
record for every field change, you can store grouped changes (such as changing Name
to 'Kyle' and Property
to 30 in the same update) as a single record. In this case, you would need to express the collection of changes in JSON or some other blob format. This would merge the changedAttribute
and newValue
fields into one (changedValues
). For example:
或者,正如此评论中建议的a_horse_with_no_name,如果您不想History
为每个字段更改存储一条新记录,您可以将分组更改(例如在同一更新中更改Name
为 'Kyle' 和Property
30)存储为单个记录. 在这种情况下,您需要以 JSON 或其他一些 blob 格式表示更改集合。这会将changedAttribute
和newValue
字段合并为一个 ( changedValues
)。例如:
History
+==========+==========+================================+==============+
| tableId | recordId | changedValues | effectiveUtc |
| Employee | 1 | { Name: 'Kyle', Property: 30 } | N |
This is perhaps more difficult than creating a History table for every other table in your database, but it has multiple benefits:
这可能比为数据库中的每个其他表创建 History 表更困难,但它有多种好处:
- adding new fields to tables in your database won't require adding the same fields to another table
- fewer tables used
- It's easier to correlate updates to different tables over time
- 向数据库中的表添加新字段不需要将相同的字段添加到另一个表
- 使用的表更少
- 随着时间的推移,更容易将更新关联到不同的表
One architectural benefit of this design is that you are decoupling the concerns of your app and your history/audit capabilities. This design would work just as well as a microservice using a relational or even NoSQL database that is separate from your application database.
这种设计的一个架构优势是您将应用程序的关注点与您的历史/审计功能分离。这种设计与使用与应用程序数据库分离的关系数据库甚至 NoSQL 数据库的微服务一样有效。
回答by Denis de Bernardy
The best way depends on what you're doing. You want to look more deeply into slowly changing dimensions:
最好的方法取决于你在做什么。您想更深入地研究缓慢变化的维度:
https://en.wikipedia.org/wiki/Slowly_changing_dimension
https://en.wikipedia.org/wiki/Slowly_changed_dimension
In Postgres 9.2 don't miss the tsrange type, too. It allows to merge start_date
and end_date
into a single column, and to index the stuff with a GIST (or GIN) index alongside an exclude constraint to avoid overlapping date ranges.
在 Postgres 9.2 中也不要错过 tsrange 类型。它允许将start_date
并合并end_date
到单个列中,并使用 GIST(或 GIN)索引和排除约束对内容进行索引,以避免重叠日期范围。
Edit:
编辑:
there should be an understanding of which rows belong to the same "revision cluster"
应该了解哪些行属于同一个“修订集群”
In this case you wantdate ranges in a way or another in your tables, rather than revision numbers or live flags, else you'll end up duplicating related data all over the place.
在这种情况下,您希望表格中的日期范围以某种方式存在,而不是修订号或实时标志,否则您最终会到处复制相关数据。
On a separate note, consider discriminating the audit tables from the live data, rather than storing everything in the same table. It's harder to implement and manage, but it makes for far more efficient queries on the live data.
另外,请考虑将审计表与实时数据区分开来,而不是将所有内容都存储在同一个表中。它更难实施和管理,但它可以更有效地查询实时数据。
See this related post, too: Temporal database design, with a twist (live vs draft rows)
也请参阅此相关帖子:临时数据库设计,稍有不同(实时行与草稿行)
回答by Igor Romanchenko
One of the ways to log all the changes is to create so called audit triggers
. Such triggers can log any change to the table they are on to a separate log table (that can be queried to see the history of the changes).
记录所有更改的方法之一是创建所谓的audit triggers
. 此类触发器可以将对其所在表的任何更改记录到单独的日志表中(可以查询该表以查看更改的历史记录)。
Details on the implementation here
.
实施细则here
。