SQL 表中的版本控制 - 如何处理?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3772933/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 07:38:16  来源:igfitidea点击:

Versioning in SQL Tables - how to handle it?

sqlormversioning

提问by corsiKa

Here's a fictional scenario with some populated data. For tax purposes, my fictional company must retain records of historical data. For this reason, I've included a version column to the table.

这是一个包含一些填充数据的虚构场景。出于税收目的,我虚构的公司必须保留历史数据记录。出于这个原因,我在表格中包含了一个版本列。

TABLE EMPLOYEE: (with personal commentary)

|ID | VERSION | NAME       | Position | PAY |
+---+---------+------------+----------+-----+
| 1 |    1    | John Doe   | Owner    | 100 | Started company
| 1 |    2    | John Doe   | Owner    |  80 | Pay cut to hire a coder
| 2 |    1    | Mark May   | Coder    |  20 | Hire said coder
| 2 |    2    | Mark May   | Coder    |  30 | Productive coder gets raise
| 3 |    1    | Jane Field | Admn Asst|  15 | Need office staff
| 2 |    3    | Mark May   | Coder    |  35 | Productive coder gets raise
| 1 |    3    | John Doe   | Owner    | 120 | Sales = profit for owner!
| 3 |    2    | Jane Field | Admn Asst|  20 | Raise for office staff
| 4 |    1    | Cody Munn  | Coder    |  20 | Hire another coder
| 4 |    2    | Cody Munn  | Coder    |  25 | Give that coder raise
| 3 |    3    | Jane Munn  | Admn Asst|  20 | Jane marries Cody <3
| 2 |    4    | Mark May   | Dev Lead |  40 | Promote mark to Dev Lead
| 4 |    3    | Cody Munn  | Coder    |  30 | Give Cody a raise
| 2 |    5    | Mark May   | Retired  |   0 | Mark retires
| 5 |    1    | Joey Trib  | Dev Lead |  40 | Bring outside help for Dev Lead
| 6 |    1    | Hire Meplz | Coder    |  10 | Hire a cheap coder
| 3 |    4    | Jane Munn  | Retired  |   0 | Jane quits
| 7 |    1    | Work Fofre | Admn Asst|  10 | Hire Janes replacement
| 8 |    1    | Fran Hesky | Coder    |  10 | Hire another coder
| 9 |    1    | Deby Olav  | Coder    |  25 | Hire another coder
| 4 |    4    | Cody Munn  | VP Ops   |  80 | Promote Cody
| 9 |    2    | Deby Olav  | VP Ops   |  80 | Cody fails at VP Ops, promote Deby
| 4 |    5    | Cody Munn  | Retired  |   0 | Cody retires in shame
| 5 |    2    | Joey Trib  | Dev Lead |  50 | Give Joey a raise
+---+---------+------------+----------+-----+

Now, if I wanted to do something like "Get a list of the current coders" I couldn't just do SELECT * FROM EMPLOYEE WHERE Position = 'Coder'because that would return lots of historical data... which is bad.

现在,如果我想做诸如“获取当前编码人员的列表”之类的事情SELECT * FROM EMPLOYEE WHERE Position = 'Coder',我不能这样做,因为这会返回大量历史数据……这很糟糕。

I'm looking for good ideas to handle this scenario. I see a few options that jump out at me, but I'm sure someone's going to say "Wow, that's a rookie mistake, glow... try this on for size:" which is what this place is all about, right? :-)

我正在寻找处理这种情况的好主意。我看到一些选项让我眼前一亮,但我敢肯定有人会说“哇,这是一个新手错误,发光......试试这个尺寸:”这就是这个地方的全部意义,对吧?:-)

Idea number 1:Keep a version table with the current version like this

想法1:像这样保留当前版本的版本表

TABLE EMPLOYEE_VERSION:

|ID |VERSION|
+---+-------+
| 1 |   3   |
| 2 |   5   |
| 3 |   4   |
| 4 |   6   |
| 5 |   2   |
| 6 |   1   |
| 7 |   1   |
| 8 |   1   |
| 9 |   2   |     
+---+-------+

Although I'm not sure how I'd do that with a single query, I'm sure it could be done, and I bet I could figure it out with a rather small amount of effort.

尽管我不确定如何使用单个查询来做到这一点,但我确信它可以完成,而且我敢打赌我可以通过相当小的努力来解决这个问题。

Of course, I would have to update this table every time I insert into the EMPLOYEE table to increment the version for the given ID (or insert into the version table when a new id is made).

当然,我每次插入 EMPLOYEE 表时都必须更新此表以增加给定 ID 的版本(或在创建新 ID 时插入到版本表中)。

The overhead of that seems undesireable.

这样做的开销似乎是不可取的。

Idea number 2:Keep an archive table and a main table. Before updating the main table, insert the row I'm about to overwrite into archive table, and use the main table as I normally would as if I wasn't concerned about versioning.

想法 2:保留一个存档表和一个主表。在更新主表之前,将我将要覆盖的行插入存档表,然后像往常一样使用主表,就好像我不关心版本控制一样。

Idea number 3:Find a query that adds something along the lines of SELECT * FROM EMPLOYEE WHERE Position = 'Coder' and version=MaxVersionForId(EMPLOYEE.ID)... Not entirely sure how I'd do this. This seems the best idea to me, but I'm really not sure at this point.

想法 3:找到一个查询,添加一些类似SELECT * FROM EMPLOYEE WHERE Position = 'Coder' and version=MaxVersionForId(EMPLOYEE.ID)......不完全确定我将如何做到这一点。这对我来说似乎是最好的主意,但目前我真的不确定。

Idea number 4:Make a column for "current" and add "WHERE current = true AND ..."

想法4:为“当前”创建一列并添加“WHERE current = true AND ...”

It occurs to me that surely people have done this before, run into these same problems, and have insight on it to share, and so I come to collect that! :) I've tried to find examples of the problem on here already, but they seems specialized to a particular scenario.

在我看来,肯定有人以前这样做过,遇到过同样的问题,并有洞察力来分享,所以我来收集它!:) 我已经尝试在这里找到问题的示例,但它们似乎专门针对特定场景。

Thanks!

谢谢!

EDIT 1:

编辑 1:

Firstly, I appreciate all answers, and you've all said the same thing - DATEis better than VERSION NUMBER. One reason I was going with VERSION NUMBERwas to simplify the process of updating in the server to prevent the following scenario

首先,我感谢所有的答案,你们都说了同样的话——DATEVERSION NUMBER. 我采用的一个原因是VERSION NUMBER简化服务器中的更新过程以防止出现以下情况

Person A loads employee record 3 in his session, and it has version 4. Person B loads employee record 3 in his session, and it has version 4. Person A makes changes and commits. This works because the most recent version in the database is 4. It is now 5. Person B makes changes and commits. This fails because the most recent version is 5, while his is 4.

人 A 在他的会话中加载员工记录 3,它有版本 4。人 B 在他的会话中加载员工记录 3,它有版本 4。人 A 进行更改并提交。这是有效的,因为数据库中的最新版本是 4。现在是 5。B 进行更改并提交。这失败了,因为最新版本是 5,而他的版本是 4。

How would the EFFECTIVE DATEpattern address this issue?

EFFECTIVE DATE模式将如何解决这个问题?

EDIT 2:

编辑2:

I think I could do it by doing something like this: Person A loads employee record 3 in his session, and it's effective date is 1-1-2010, 1:00 pm, with no experation. Person B loads employee record 3 in his session, and it's effective date is 1-1-2010, 1:00 pm, with no experation. Person A makes changes and commits. The old copy goes to the archive table (basically idea 2) with an experation date of 9/22/2010 1:00 pm. The updated version of the main table has an effective date of 9/22/2010 1:00 pm. Person B makes changes and commits. The commit fails because the effective dates (in the database and session) don't match.

我想我可以通过做这样的事情来做到这一点:A 在他的会话中加载员工记录 3,它的生效日期是 1-1-2010,下午 1:00,没有实验。B 人在其会话中加载员工记录 3,其生效日期为 1-1-2010,下午 1:00,无实验。人员 A 进行更改并提交。旧副本进入存档表(基本上是想法 2),实验日期为 2010 年 9 月 22 日下午 1:00。主表的更新版本生效日期为 2010 年 9 月 22 日下午 1:00。人员 B 进行更改并提交。提交失败,因为有效日期(在数据库和会话中)不匹配。

回答by NotMe

I think you've started down the wrong path.

我认为你已经走上了错误的道路。

Typically, for versioning or storing historical data you do one of two (or both) things.

通常,对于版本控制或存储历史数据,您可以执行以下两项(或两项)操作之一。

  1. You have a separate table that mimics the original table + a date/time column for the date it was changed. Whenever a record is updated, you insert the existing contents into the history table just prior to the update.

  2. You have a separate warehouse database. In this case you can either version it just like in #1 above OR you simply snapshot it once every so often (hourly, daily, weekly..)

  1. 您有一个单独的表来模拟原始表 + 更改日期的日期/时间列。每当更新记录时,就在更新之前将现有内容插入到历史记录表中。

  2. 您有一个单独的仓库数据库。在这种情况下,您可以像上面的 #1 一样对其进行版本化,或者您只需每隔一段时间(每小时、每天、每周...)

Keeping your version number in the same table as your normal one has several problems. First, the table size is going to grow like crazy. This will put constant pressure on normal production queries.

将您的版本号与正常的版本号放在同一张表中会有几个问题。首先,表的大小会像疯了一样增长。这将对正常的生产查询施加持续的压力。

Second, it's going to radically increase your query complexity for joins etc in order to make sure the latest version of each record is being used.

其次,它会从根本上增加连接等查询的复杂性,以确保使用每条记录的最新版本。

回答by Zachary Yates

What you have here is called a Slowly Changing Dimension (SCD). There are some proven methods for dealing with it:

您在这里拥有的称为缓慢变化的维度 (SCD)。有一些行之有效的方法来处理它:

http://en.wikipedia.org/wiki/Slowly_changing_dimension

http://en.wikipedia.org/wiki/Slowly_changed_dimension

Thought I'd add that since no one seems to call it by name.

我想我会补充一点,因为似乎没有人直呼其名。

回答by James Cane

An approach that I've designed for a recent database is to use revisions as follows:

我为最近的数据库设计的一种方法是使用如下修订:

  • Keep your entity in two tables:

    1. "employee" stores a primary key ID and any data that you do not want to be versioned (if there is any).

    2. "employee_revision" stores all the salient data about the employee, with a foreign key to the employee table and a foreign key, "RevisionID" to a table called "revision".

  • Make a new table called "revision". This can be used by all the entities in your database, not just employee. It contains an identity column for the primary key (or AutoNumber, or whatever your database calls such a thing). It also contains EffectiveFrom and EffectiveTo columns. I also have a text column on the table - entity_type - for human readability reasons which contain the name of the primary revision table (in this case "employee"). The revision table contains no foreign keys. The default value for EffectiveFrom is 1-Jan-1900 and the default value for EffectiveTo is 31-Dec-9999. This allows me to not simplify the date querying.

  • 将您的实体保存在两个表中:

    1. “员工”存储主键 ID 和您不希望进行版本控制的任何数据(如果有)。

    2. “employee_revision”存储有关雇员的所有重要数据,外键指向雇员表,外键“RevisionID”指向名为“revision”的表。

  • 制作一个名为“修订版”的新表。这可以被数据库中的所有实体使用,而不仅仅是员工。它包含主键的标识列(或自动编号,或任何您的数据库调用这样的东西)。它还包含 EffectiveFrom 和 EffectiveTo 列。我还在表上有一个文本列 - entity_type - 出于人类可读性的原因,其中包含主要修订表的名称(在本例中为“员工”)。修订表不包含外键。EffectiveFrom 的默认值为 1-Jan-1900, EffectiveTo 的默认值为 31-Dec-9999。这允许我不简化日期查询。

I make sure that the revision table is well indexed on (EffectiveFrom, EffectiveTo, RevisionID) and also on (RevisionID, EffectiveFrom, EffectiveTo).

我确保修订表在(EffectiveFrom、EffectiveTo、RevisionID)和(RevisionID、EffectiveFrom、EffectiveTo)上都有很好的索引。

I can then use joins and simple <> comparisons to select an appropriate record for any date. This also means that relations between entities are also fully versioned. In fact, I find it useful to use SQL Server table-valued functions to allow very simply querying of any date.

然后我可以使用连接和简单的 <> 比较来为任何日期选择合适的记录。这也意味着实体之间的关系也是完全版本化的。事实上,我发现使用 SQL Server 表值函数允许非常简单地查询任何日期很有用。

Here's an example (assuming that you don't want to version employee names so that if they change their name, the change is effective historically).

这是一个示例(假设您不想对员工姓名进行版本控制,以便在他们更改姓名时,更改在历史上仍然有效)。

--------
employee
--------
employee_id  |  employee_name
-----------  |  -------------
12351        |  John Smith

-----------------
employee_revision
-----------------
employee_id  |  revision_id  |  department_id  |  position_id  |  pay
-----------  |  -----------  |  -------------  |  -----------  |  ----------
12351        |  657442       |  72             |  23           |  22000.00
12351        |  657512       |  72             |  27           |  22000.00
12351        |  657983       |  72             |  27           |  28000.00

--------
revision
--------
revision_id  |  effective_from  |  effective_to  |  entity_type
-----------  |  --------------  |  ------------  |  -----------
657442       |  01-Jan-1900     |  03-Mar-2007   |  EMPLOYEE
657512       |  04-Mar-2007     |  22-Jun-2009   |  EMPLOYEE
657983       |  23-Jun-2009     |  31-Dec-9999   |  EMPLOYEE

One advantage of storing your revision metadata in a separate table is that it's easy to apply it consistently to all your entities. Another is that it's easier to expand it to include other things, such as branches or scenarios, without having to modify every table. My principal reason is that it keeps your main entity tables clear and uncluttered.

将修订元数据存储在单独的表中的一个优点是可以轻松地将其一致地应用于所有实体。另一个是更容易扩展它以包含其他内容,例如分支或场景,而无需修改每个表。我的主要原因是它使您的主要实体表保持清晰和整洁。

(The data and example above are fictional - my database does not model employees).

(上面的数据和示例是虚构的 - 我的数据库没有为员工建模)。

回答by RedFilter

Here is my suggested approach, which has worked very well for me in the past:

这是我建议的方法,过去对我来说效果很好:

  • Forget the version number. Instead, use StartDateand EndDatecolumns
  • Write a trigger to ensure that there are no overlapping date ranges for the same ID, and that there is only ever one record with a NULLEndDatefor the same ID(this is your currently effective record)
  • Put indexes on StartDateand EndDate; this should give you reasonable performance
  • 忘记版本号。相反,使用StartDateEndDate
  • 编写一个触发器以确保相同的 没有重叠的日期范围ID,并且只有一个记录具有NULLEndDate相同的 a ID(这是您当前的有效记录)
  • 将索引放在StartDateand 上EndDate;这应该给你合理的表现

This will easily let you report by date:

这将很容易让您按日期报告:

select *
from MyTable 
where MyReportDate between StartDate and EndDate

or get the current info:

或获取当前信息:

select *
from MyTable 
where EndDate is null

回答by Mohammad Azhdari

Although the question has asked 8 years ago, it worths to mention there is feature exactly for this in SQL Server 2016. System-versioned Temporal Table

尽管这个问题是 8 年前提出的,但值得一提的是,SQL Server 2016 中有一个专门针对此问题的功能。系统版本化的临时表

Every table in SQL Server 2016 and above can have a history table, which the historical data will be populated automatically by SQL Server itself.

SQL Server 2016 及更高版本中的每个表都可以有一个历史表,历史数据将由 SQL Server 本身自动填充。

All you need is to add two datetime2 columns and one clause to the table:

您只需要向表中添加两个 datetime2 列和一个子句:

CREATE TABLE Employee 
(
    Id int NOT NULL PRIMARY KEY CLUSTERED,
    [Name] varchar(50) NOT NULL,
    Position varchar(50)  NULL,
    Pay money NULL,
    ValidFrom datetime2 GENERATED ALWAYS AS ROW START NOT NULL,
    ValidTo datetime2 GENERATED ALWAYS AS ROW END NOT NULL,
        PERIOD FOR SYSTEM_TIME (ValidFrom,ValidTo)
)  
WITH (SYSTEM_VERSIONING = ON);

The system versioned table creates a temporal table which maintains the history of the data. You can use a custom name WITH (SYSTEM_VERSIONING = ON ( HISTORY_TABLE = dbo.EmployeeHistory ) );

系统版本表创建了一个临时表,用于维护数据的历史记录。您可以使用自定义名称WITH (SYSTEM_VERSIONING = ON ( HISTORY_TABLE = dbo.EmployeeHistory ) );

In this linkyou can find more details about System-version temporal tables.

此链接中,您可以找到有关系统版本时态表的更多详细信息。

As @NotMe mentioned, historical tables can be grow very fast, so there are a few ways to get around this. Take a look here

正如@NotMe 提到的,历史表可以增长得非常快,所以有几种方法可以解决这个问题。看看这里

回答by JNK

Idea 3 will work:

想法 3 将起作用:

SELECT * FROM EMPLOYEE AS e1
WHERE Position = 'Coder'
AND Version = (
    SELECT MAX(Version) FROM Employee AS e2
    WHERE e1.ID=e2.ID)

You really want to use something like a date though, which is much easier to program and track, and will use the same logic (something like an EffectiveDatecolumn)

不过,您确实想使用诸如日期之类的东西,它更易于编程和跟踪,并且将使用相同的逻辑(类似于EffectiveDate列)

EDIT:

编辑

Chris is totally correct about moving this info out of your production table for performance, especially if you expect frequent updates. Another option would be to make a VIEWthat only shows you the most recent version of each person's info, that you build off of this table.

Chris 完全正确地将这些信息从生产表中移出以提高性能,尤其是在您期望频繁更新的情况下。另一种选择是创建一个视图,只显示每个人信息的最新版本,这是您基于此表构建的。

回答by Cruachan

You are definitely doing this wrong. Keeping a database running sweetly requires that you only have the minimum amount of data in your production tables that you need. Inevitably holding historical data in with the live adds redundancy that will complicate queries and slow performance, plus your successors are going to look really askew at this before submitting it to the DailyWTF!

你肯定做错了。保持数据库正常运行要求您的生产表中只有您需要的最少量数据。不可避免地将历史数据与实时数据一起保存会增加冗余,这会使查询复杂化并降低性能,而且您的继任者在将其提交给 DailyWTF 之前会对此产生怀疑!

Instead create a copy of the table - EmployeeHistorical for instance - but with the ID column not set as identity (you might choose to add an additional new ID column and a dateCreated timestamp column too). Then add a trigger to your Employee table that fires on update & delete and writes out a copy of the complete row to the Historical table. And while you're at it capturing the ID of the user doing the edit often comes in handy for audit purposes.

而是创建表的副本 - 例如 EmployeeHistorical - 但 ID 列未设置为标识(您也可以选择添加额外的新 ID 列和 dateCreated 时间戳列)。然后向您的 Employee 表添加一个触发器,该触发器在更新和删除时触发,并将完整行的副本写出到历史表中。当您在此期间捕获进行编辑的用户的 ID 时,对于审计目的通常会派上用场。

Generally when I'm doing this on an active table I try and create the historical table in a different database as among other things this reduces fragmentation (and hence maintenance) on your prime database and it's easier to handle backups - as archives can grow very large.

通常,当我在活动表上执行此操作时,我会尝试在不同的数据库中创建历史表,因为这会减少主要数据库上的碎片(从而减少维护),并且更容易处理备份 - 因为存档可以增长得非常快大的。

Your issues about edit contention should be handled with the normal database transaction and locking mechanisms. Coding adhoc hacks up to emulate such yourself is always time-consuming and error prone (some edge condition you've not thought of always pops up, and to write locks correctly you've really got to grok sempahores, which is decidedly non-trivial)

您关于编辑争用的问题应该使用正常的数据库事务和锁定机制来处理。编写临时代码来模拟这样的你自己总是耗时且容易出错(一些你没有想到的边缘条件总是会弹出,并且要正确编写锁,你真的必须了解sempahores,这绝对不是微不足道的)