SQL 修订的数据库设计?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39281/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Database Design for Revisions?
提问by Ramesh Soni
We have a requirement in project to store all the revisions(Change History) for the entities in the database. Currently we have 2 designed proposals for this:
我们在项目中要求将实体的所有修订(更改历史记录)存储在数据库中。目前,我们有 2 个为此设计的提案:
e.g. for "Employee" Entity
例如对于“员工”实体
Design 1:
设计1:
-- Holds Employee Entity
"Employees (EmployeeId, FirstName, LastName, DepartmentId, .., ..)"
-- Holds the Employee Revisions in Xml. The RevisionXML will contain
-- all data of that particular EmployeeId
"EmployeeHistories (EmployeeId, DateModified, RevisionXML)"
Design 2:
设计二:
-- Holds Employee Entity
"Employees (EmployeeId, FirstName, LastName, DepartmentId, .., ..)"
-- In this approach we have basically duplicated all the fields on Employees
-- in the EmployeeHistories and storing the revision data.
"EmployeeHistories (EmployeeId, RevisionId, DateModified, FirstName,
LastName, DepartmentId, .., ..)"
Is there any other way of doing this thing?
有没有其他方法可以做这件事?
The problem with the "Design 1" is that we have to parse XML each time when you need to access data. This will slow the process and also add some limitations like we cannot add joins on the revisions data fields.
“设计 1”的问题在于,每次需要访问数据时,我们都必须解析 XML。这将减慢进程并增加一些限制,例如我们无法在修订数据字段上添加连接。
And the problem with the "Design 2" is that we have to duplicate each and every field on all entities (We have around 70-80 entities for which we want to maintain revisions).
“设计 2”的问题是我们必须复制所有实体上的每个字段(我们有大约 70-80 个我们想要维护修订的实体)。
采纳答案by Simon Munro
- Do notput it all in one table with an IsCurrent discriminator attribute. This just causes problems down the line, requires surrogate keys and all sorts of other problems.
- Design 2 does have problems with schema changes. If you change the Employees table you have to change the EmployeeHistories table and all the related sprocs that go with it. Potentially doubles you schema change effort.
- Design 1 works well and if done properly does not cost much in terms of a performance hit. You could use an xml schema and even indexes to get over possible performance problems. Your comment about parsing the xml is valid but you could easily create a view using xquery - which you can include in queries and join to. Something like this...
- 千万不能把它们都放在一个表与IsCurrent鉴别属性。这只会导致问题,需要代理键和各种其他问题。
- 设计 2 确实存在架构更改问题。如果您更改员工表,您必须更改员工历史表和所有与之相关的 sproc。可能会使您的架构更改工作加倍。
- 设计 1 运行良好,如果做得好,不会对性能造成太大影响。您可以使用 xml 模式甚至索引来解决可能的性能问题。您关于解析 xml 的评论是有效的,但您可以使用 xquery 轻松创建视图 - 您可以将其包含在查询中并加入。像这样的东西...
CREATE VIEW EmployeeHistory
AS
, FirstName, , DepartmentId
SELECT EmployeeId, RevisionXML.value('(/employee/FirstName)[1]', 'varchar(50)') AS FirstName,
RevisionXML.value('(/employee/LastName)[1]', 'varchar(100)') AS LastName,
RevisionXML.value('(/employee/DepartmentId)[1]', 'integer') AS DepartmentId,
FROM EmployeeHistories
回答by Chris Roberts
I think the key question to ask here is 'Who / What is going to be using the history'?
我认为这里要问的关键问题是“谁/什么将使用历史”?
If it's going to be mostly for reporting / human readable history, we've implemented this scheme in the past...
如果主要是为了报告/人类可读的历史,我们过去已经实施了这个方案......
Create a table called 'AuditTrail' or something that has the following fields...
创建一个名为“AuditTrail”或具有以下字段的表...
[ID] [int] IDENTITY(1,1) NOT NULL,
[UserID] [int] NULL,
[EventDate] [datetime] NOT NULL,
[TableName] [varchar](50) NOT NULL,
[RecordID] [varchar](20) NOT NULL,
[FieldName] [varchar](50) NULL,
[OldValue] [varchar](5000) NULL,
[NewValue] [varchar](5000) NULL
You can then add a 'LastUpdatedByUserID' column to all of your tables which should be set every time you do an update / insert on the table.
然后,您可以将“LastUpdatedByUserID”列添加到所有表中,每次在表上执行更新/插入时都应设置该列。
You can then add a trigger to every table to catch any insert / update that happens and creates an entry in this table for each field that's changed. Because the table is also being supplied with the 'LastUpdateByUserID' for each update / insert, you can access this value in the trigger and use it when adding to the audit table.
然后,您可以向每个表添加一个触发器以捕获发生的任何插入/更新,并在此表中为每个更改的字段创建一个条目。由于该表还为每次更新/插入提供了“LastUpdateByUserID”,因此您可以在触发器中访问该值并在添加到审计表时使用它。
We use the RecordID field to store the value of the key field of the table being updated. If it's a combined key, we just do a string concatenation with a '~' between the fields.
我们使用 RecordID 字段来存储正在更新的表的键字段的值。如果它是组合键,我们只需在字段之间使用“~”进行字符串连接。
I'm sure this system may have drawbacks - for heavily updated databases the performance may be hit, but for my web-app, we get many more reads than writes and it seems to be performing pretty well. We even wrote a little VB.NET utility to automatically write the triggers based on the table definitions.
我确信这个系统可能有缺点——对于大量更新的数据库,性能可能会受到影响,但对于我的 web 应用程序,我们得到的读取比写入多得多,而且它似乎表现得很好。我们甚至编写了一个小的 VB.NET 实用程序来根据表定义自动编写触发器。
Just a thought!
只是一个想法!
回答by Mark Streatfield
The History Tablesarticle in the Database Programmerblog might be useful - covers some of the points raised here and discusses the storage of deltas.
Database Programmer博客中的History Tables文章可能很有用 - 涵盖了此处提出的一些要点并讨论了增量的存储。
Edit
编辑
In the History Tablesessay, the author (Kenneth Downs), recommends maintaining a history table of at least seven columns:
在History Tables文章中,作者 ( Kenneth Downs) 建议维护一个至少包含七列的历史表:
- Timestamp of the change,
- User that made the change,
- A token to identify the record that was changed (where the history is maintained separately from the current state),
- Whether the change was an insert, update, or delete,
- The old value,
- The new value,
- The delta (for changes to numerical values).
- 更改的时间戳,
- 进行更改的用户,
- 标识已更改记录的令牌(其中历史记录与当前状态分开维护),
- 无论更改是插入、更新还是删除,
- 旧的价值,
- 新的价值,
- 增量(用于更改数值)。
Columns which never change, or whose history is not required, should not be tracked in the history table to avoid bloat. Storing the delta for numerical values can make subsequent queries easier, even though it can be derived from the old and new values.
永远不会更改或不需要其历史记录的列不应在历史记录表中进行跟踪以避免膨胀。存储数值的 delta 可以使后续查询更容易,即使它可以从旧值和新值中导出。
The history table must be secure, with non-system users prevented from inserting, updating or deleting rows. Only periodic purging should be supported to reduce overall size (and if permitted by the use case).
历史表必须是安全的,防止非系统用户插入、更新或删除行。仅应支持定期清除以减小整体大小(如果用例允许)。
回答by Kjetil Watnedal
We have implemented a solution very similar to the solution that Chris Roberts suggests, and that works pretty well for us.
我们已经实施了一个与 Chris Roberts 建议的解决方案非常相似的解决方案,这对我们来说非常有效。
Only difference is that we only store the new value. The old value is after all stored in the previous history row
唯一的区别是我们只存储新值。旧值毕竟存储在前一个历史记录行中
[ID] [int] IDENTITY(1,1) NOT NULL,
[UserID] [int] NULL,
[EventDate] [datetime] NOT NULL,
[TableName] [varchar](50) NOT NULL,
[RecordID] [varchar](20) NOT NULL,
[FieldName] [varchar](50) NULL,
[NewValue] [varchar](5000) NULL
Lets say you have a table with 20 columns. This way you only have to store the exact column that has changed instead of having to store the entire row.
假设您有一个包含 20 列的表格。这样,您只需存储已更改的确切列,而不必存储整行。
回答by TMS
Avoid Design 1; it is not very handy once you will need to for example rollback to old versions of the records - either automatically or "manually" using administrators console.
避免设计 1;一旦您需要例如回滚到旧版本的记录,它就不是很方便 - 使用管理员控制台自动或“手动”。
I don't really see disadvantages of Design 2. I think the second, History table should contain all columns present in the first, Records table. E.g. in mysql you can easily create table with the same structure as another table (create table X like Y
). And, when you are about to change structure of the Records table in your live database, you have to use alter table
commands anyway - and there is no big effort in running these commands also for your History table.
我真的没有看到设计 2 的缺点。我认为第二个历史记录表应该包含第一个记录表中存在的所有列。例如,在 mysql 中,您可以轻松创建与另一个表 ( create table X like Y
)具有相同结构的表。而且,当您要更改实时数据库中 Records 表的结构时,alter table
无论如何您都必须使用命令 - 并且为您的 History 表运行这些命令也没有太大的努力。
Notes
笔记
- Records table contains only lastest revision;
- History table contains all previous revisions of records in Records table;
- History table's primary key is a primary key of the Records table with added
RevisionId
column; - Think about additional auxiliary fields like
ModifiedBy
- the user who created particular revision. You may also want to have a fieldDeletedBy
to track who deleted particular revision. - Think about what
DateModified
should mean - either it means where this particular revision was created, or it will mean when this particular revision was replaced by another one. The former requires the field to be in the Records table, and seems to be more intuitive at the first sight; the second solution however seems to be more practical for deleted records (date when this particular revision was deleted). If you go for the first solution, you would probably need a second fieldDateDeleted
(only if you need it of course). Depends on you and what you actually want to record.
- 记录表只包含最新修订;
- History 表包含 Records 表中记录的所有先前修订;
- History 表的主键是Records 表的主键,增加了
RevisionId
列; - 考虑其他辅助字段,例如
ModifiedBy
- 创建特定修订的用户。您可能还需要一个字段DeletedBy
来跟踪谁删除了特定修订。 - 想一想
DateModified
应该是什么意思——要么意味着这个特定修订的创建地点,要么意味着这个特定修订何时被另一个修订所取代。前者要求字段在Records表中,乍一看似乎更直观;然而,第二个解决方案对于已删除的记录(删除此特定修订的日期)似乎更实用。如果您采用第一个解决方案,您可能需要第二个字段DateDeleted
(当然只有在您需要时才需要)。取决于您和您实际想要录制的内容。
Operations in Design 2 are very trivial:
设计 2 中的操作非常简单:
Modify调整- copy the record from Records table to History table, give it new RevisionId (if it is not already present in Records table), handle DateModified (depends on how you interpret it, see notes above)
- go on with normal update of the record in Records table
- 将记录从 Records 表复制到 History 表,给它新的 RevisionId(如果它不存在于 Records 表中),处理 DateModified(取决于你如何解释它,见上面的注释)
- 继续正常更新 Records 表中的记录
- do exactly the same as in the first step of Modify operation. Handle DateModified/DateDeleted accordingly, depending on the interpretation you have chosen.
- 与修改操作的第一步完全相同。根据您选择的解释,相应地处理 DateModified/DateDeleted。
- take highest (or some particular?) revision from History table and copy it to the Records table
- 从 History 表中获取最高(或某些特定的?)修订版并将其复制到 Records 表中
- select from History table and Records table
- think what exactly you expect from this operation; it will probably determine what information you require from DateModified/DateDeleted fields (see notes above)
- 从历史表和记录表中选择
- 想想你对这次手术的期望是什么;它可能会确定您需要从 DateModified/DateDeleted 字段获取哪些信息(请参阅上面的注释)
If you go for Design 2, all SQL commands needed to do that will be very very easy, as well as maintenance! Maybe, it will be much much easier if you use the auxiliary columns (RevisionId
, DateModified
) also in the Records table - to keep both tables at exactly the same structure(except for unique keys)! This will allow for simple SQL commands, which will be tolerant to any data structure change:
如果您选择设计 2,那么执行此操作所需的所有 SQL 命令以及维护都将非常容易!也许,如果您也在 Records 表中使用辅助列 ( RevisionId
, DateModified
)会容易得多- 使两个表保持完全相同的结构(唯一键除外)!这将允许简单的 SQL 命令,它将容忍任何数据结构更改:
insert into EmployeeHistory select * from Employe where ID = XX
Don't forget to use transactions!
不要忘记使用事务!
As for the scaling, this solution is very efficient, since you don't transform any data from XML back and forth, just copying whole table rows - very simple queries, using indices - very efficient!
至于缩放,此解决方案非常有效,因为您无需从 XML 来回转换任何数据,只需复制整个表行 - 非常简单的查询,使用索引 - 非常高效!
回答by ConcernedOfTunbridgeWells
If you have to store history, make a shadow table with the same schema as the table you are tracking and a 'Revision Date' and 'Revision Type' column (e.g. 'delete', 'update'). Write (or generate - see below) a set of triggers to populate the audit table.
如果您必须存储历史记录,请使用与您正在跟踪的表相同的架构以及“修订日期”和“修订类型”列(例如“删除”、“更新”)制作影子表。编写(或生成 - 见下文)一组触发器来填充审计表。
It's fairly straightforward to make a tool that will read the system data dictionary for a table and generate a script that creates the shadow table and a set of triggers to populate it.
制作一个工具来读取表的系统数据字典并生成一个脚本来创建影子表和一组触发器来填充它是相当简单的。
Don't try to use XML for this, XML storage is a lot less efficient than the native database table storage that this type of trigger uses.
不要尝试为此使用 XML,XML 存储的效率远低于此类触发器使用的本机数据库表存储。
回答by aku
Ramesh, I was involved in development of system based on first approach.
It turned out that storing revisions as XML is leading to a huge database growth and significantly slowing things down.
My approach would be to have one table per entity:
Ramesh,我参与了基于第一种方法的系统开发。
事实证明,将修订存储为 XML 会导致巨大的数据库增长并显着减慢速度。
我的方法是每个实体有一张桌子:
Employee (Id, Name, ... , IsActive)
where IsActiveis a sign of the latest version
其中IsActive是最新版本的标志
If you want to associate some additional info with revisions you can create separate table containing that info and link it with entity tables using PK\FK relation.
如果您想将一些附加信息与修订相关联,您可以创建包含该信息的单独表,并使用 PK\FK 关系将其与实体表链接。
This way you can store all version of employees in one table. Pros of this approach:
通过这种方式,您可以将所有版本的员工存储在一张表中。这种方法的优点:
- Simple data base structure
- No conflicts since table becomes append-only
- You can rollback to previous version by simply changing IsActive flag
- No need for joins to get object history
- 简单的数据库结构
- 没有冲突,因为表变为仅附加
- 您只需更改 IsActive 标志即可回滚到以前的版本
- 无需连接即可获取对象历史记录
Note that you should allow primary key to be non unique.
请注意,您应该允许主键不唯一。
回答by Matthew Watson
The way that I've seen this done in the past is have
我过去看到这样做的方式是
Employees (EmployeeId, DateModified, < Employee Fields > , boolean isCurrent );
You never "update" on this table (except to change the valid of isCurrent), just insert new rows. For any given EmployeeId, only 1 row can have isCurrent == 1.
您永远不会在此表上“更新”(除了更改 isCurrent 的有效值),只需插入新行。对于任何给定的 EmployeeId,只有 1 行可以有 isCurrent == 1。
The complexity of maintaining this can be hidden by views and "instead of" triggers (in oracle, I presume similar things other RDBMS), you can even go to materialized views if the tables are too big and can't be handled by indexes).
维护这一点的复杂性可以被视图和“代替”触发器隐藏(在 oracle 中,我假设其他 RDBMS 也有类似的东西),如果表太大并且不能由索引处理,您甚至可以转到物化视图) .
This method is ok, but you can end up with some complex queries.
这个方法没问题,但你最终会得到一些复杂的查询。
Personally, I'm pretty fond of your Design 2 way of doing it, which is how I've done it in the past as well. Its simple to understand, simple to implement and simple to maintain.
就我个人而言,我非常喜欢你设计 2 的方式,我过去也是这样做的。它易于理解、易于实现和易于维护。
It also creates very little overhead for the database and application, especially when performing read queries, which is likely what you'll be doing 99% of the time.
它还为数据库和应用程序创建了很少的开销,尤其是在执行读取查询时,您 99% 的时间都可能会这样做。
It would also be quite easy to automatic the creation of the history tables and triggers to maintain (assuming it would be done via triggers).
自动创建历史表和触发器也很容易维护(假设它是通过触发器完成的)。
回答by Mehran
I'm going to share with you my design and it's different from your both designs in that it requires one table per each entity type. I found the best way to describe any database design is through ERD, here's mine:
我将与您分享我的设计,它与您的两种设计的不同之处在于每种实体类型需要一个表。我发现描述任何数据库设计的最佳方法是通过 ERD,这是我的:
In this example we have an entity named employee. usertable holds your users' records and entityand entity_revisionare two tables which hold revision history for all the entity types that you will have in your system. Here's how this design works:
在这个例子中,我们有一个名为employee的实体。user表保存您的用户记录,entity和entity_revision是两个表,它们保存您将在系统中拥有的所有实体类型的修订历史记录。以下是此设计的工作原理:
The two fields of entity_idand revision_id
entity_id和revision_id两个字段
Each entity in your system will have a unique entity id of its own. Your entity might go through revisions but its entity_id will remain the same. You need to keep this entity id in you employee table (as a foreign key). You should also store the type of your entity in the entitytable (e.g. 'employee'). Now as for the revision_id, as its name shows, it keep track of your entity revisions. The best way I found for this is to use the employee_idas your revision_id. This means you will have duplicate revision ids for different types of entities but this is no treat to me (I'm not sure about your case). The only important note to make is that the combination of entity_id and revision_id should be unique.
您系统中的每个实体都将拥有自己的唯一实体 ID。您的实体可能会经历修订,但它的 entity_id 将保持不变。您需要将此实体 ID 保留在您的员工表中(作为外键)。您还应该在实体表中存储实体的类型(例如“员工”)。现在至于revision_id,正如其名称所示,它会跟踪您的实体修订。我为此找到的最好方法是使用employee_id作为您的revision_id。这意味着您将拥有不同类型实体的重复修订 ID,但这对我来说不是一种享受(我不确定您的情况)。唯一需要注意的是 entity_id 和 revision_id 的组合应该是唯一的。
There's also a statefield within entity_revisiontable which indicated the state of revision. It can have one of the three states: latest
, obsolete
or deleted
(not relying on the date of revisions helps you a great deal to boost your queries).
entity_revision表中还有一个state字段,指示修订的状态。它可以具有以下三种状态之一:,或(不依赖修订日期可以极大地提高查询量)。latest
obsolete
deleted
One last note on revision_id, I didn't create a foreign key connecting employee_id to revision_id because we don't want to alter entity_revision table for each entity type that we might add in future.
关于revision_id的最后一个说明,我没有创建将employee_id连接到revision_id的外键,因为我们不想为将来可能添加的每个实体类型更改entity_revision表。
INSERTION
插入
For each employeethat you want to insert into database, you will also add a record to entityand entity_revision. These last two records will help you keep track of by whom and when a record has been inserted into database.
对于要插入数据库的每个员工,您还将向entity和entity_revision添加一条记录。最后两条记录将帮助您跟踪记录由谁以及何时插入到数据库中。
UPDATE
更新
Each update for an existing employee record will be implemented as two inserts, one in employee table and one in entity_revision. The second one will help you to know by whom and when the record has been updated.
现有员工记录的每次更新都将实现为两次插入,一次在员工表中,一次在 entity_revision 中。第二个将帮助您了解记录由谁更新以及何时更新。
DELETION
删除
For deleting an employee, a record is inserted into entity_revision stating the deletion and done.
对于删除员工,将在 entity_revision 中插入一条记录,说明删除和完成。
As you can see in this design no data is ever altered or removed from database and more importantly each entity type requires only one table. Personally I find this design really flexible and easy to work with. But I'm not sure about you as your needs might be different.
正如您在此设计中看到的那样,没有数据被更改或从数据库中删除,更重要的是,每种实体类型只需要一个表。我个人认为这种设计非常灵活且易于使用。但是我不确定您的情况,因为您的需求可能有所不同。
[UPDATE]
[更新]
Having supported partitions in the new MySQL versions, I believe my design also comes with one of the best performances too. One can partition entity
table using type
field while partition entity_revision
using its state
field. This will boost the SELECT
queries by far while keep the design simple and clean.
在新的 MySQL 版本中支持分区后,我相信我的设计也具有最好的性能之一。可以entity
使用type
字段对表进行分区,而entity_revision
使用其state
字段进行分区。这将大大提高SELECT
查询量,同时保持设计简单和干净。
回答by Henrik Gustafsson
Revisions of data is an aspect of the 'valid-time' concept of a Temporal Database. Much research has gone into this, and many patterns and guidelines have emerged. I wrote a lengthy reply with a bunch of references to thisquestion for those interested.
数据的修订是时态数据库“有效时间”概念的一个方面。对此进行了大量研究,并且出现了许多模式和指南。我写了一篇冗长的回复,并为感兴趣的人提供了对这个问题的大量参考。