SQL 为多个组选择每组记录中的最新和特定版本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9430743/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 14:32:34  来源:igfitidea点击:

Selecting most recent and specific version in each group of records, for multiple groups

sqlsql-servertsql

提问by orange

The problem:
I have a table that records data rows in foo. Each time the row is updated, a new row is inserted along with a revision number. The table looks like:

问题:
我有一个表,在foo. 每次更新行时,都会插入一个新行以及修订号。该表看起来像:

id  rev field
1   1   test1
2   1   fsdfs
3   1   jfds
1   2   test2

Note that in the table the last record is a newer version of the first row.

请注意,表中的最后一条记录是第一行的更新版本。

Does anyone know of an efficient way to query for the latest version of the rows, ans a specific version of records? For instance, a query for rev=2would return the 2, 3 and 4th row (not the replaced 1st row though) while a query for rev=1yields those rows with rev <= 1 and in case of duplicated ids, the one with the higher revision number is chosen (record: 1, 2, 3).

有谁知道查询最新版本的行和特定版本的记录的有效方法吗?例如,查询 forrev=2将返回第 2、3 和 4 行(尽管不是替换的第一行),而查询 for 会rev=1产生 rev <= 1 的那些行,并且在重复 ID 的情况下,具有较高修订号的行是选择(记录:1、2、3)。

I'm not actually sure if this is even possible in SQL Server...

我实际上不确定这在 SQL Server 中是否可行......

I would not prefer to return the result in an iterative way.

我不希望以迭代方式返回结果。

回答by Tim

To get only latest revisions:

仅获取最新版本:

SELECT * from t t1
WHERE t1.rev = 
  (SELECT max(rev) FROM t t2 WHERE t2.id = t1.id)

To get a specific revision, in this case 1 (and if an item doesn't have the revision yet the next smallest revision):

要获得特定的修订版本,在本例中为 1(如果某个项目还没有该修订版本,则为下一个最小的修订版本):

SELECT * from foo t1
WHERE t1.rev = 
  (SELECT max(rev) 
   FROM foo t2 
   WHERE t2.id = t1.id
   AND t2.rev <= 1)

It might not be the most efficient way to do this, but right now I cannot figure a better way to do this.

这可能不是最有效的方法,但现在我想不出更好的方法来做到这一点。

回答by Tony O'Hagan

Here's an alternative solution that incurs an updatecost but is much more efficient for readingthe latest data rows as it avoids computing MAX(rev). It also works when you're doing bulk updates of subsets of the table. I needed this pattern to ensure I could efficiently switch to a new data set that was updated via a long running batch update without any windows of time where we had partially updated data visible.

这是一个替代解决方案,它会产生更新成本,但在读取最新数据行时效率更高,因为它避免了计算MAX(rev)。当您对表的子集进行批量更新时,它也适用。我需要这种模式来确保我可以有效地切换到通过长时间运行的批量更新更新的新数据集,而没有任何时间窗口显示部分更新的数据可见。

Aging

老化

  • Replace the revcolumn with an agecolumn
  • Create a view of the current latest data with filter: age = 0
  • To create a new version of your data ...
    • INSERT: new rows with age = -1- This was my slow long running batch process.
    • UPDATE: UPDATE table-name SET age = age + 1for all rows in the subset. This switches the view to the new latest data (age = 0) and also ages older data in a single transaction.
    • DELETE: rows having age > Nin the subset - Optionally purge old data
  • rev列替换age
  • 使用过滤器创建当前最新数据的视图: age = 0
  • 要创建新版本的数据...
    • INSERT: new rows with age = -1- 这是我缓慢的长时间运行的批处理过程。
    • 更新:UPDATE table-name SET age = age + 1对于子集中的所有行。这将视图切换到新的最新数据(年龄 = 0),并在单个事务中老化旧数据。
    • DELETE:age > N子集中的行- 可选地清除旧数据

Indexing

索引

  • Create a composite index with ageand then idso the view will be nice and fast and can also be used to look up by id. Although this key is effectively unique, its temporarily non-unique when you're ageing the rows (during UPDATE SET age=age+1) so you'll need to make it non-unique and ideally the clustered index. If you need to find all versions of a given idordered by age, you may need an additional non-unique index on idthen age.
  • age然后创建一个复合索引,id这样视图就会很好而且很快,也可以用来按 id 查找。尽管此键实际上是唯一的,但当您对行进行老化时(在 期间UPDATE SET age=age+1),它暂时不唯一,因此您需要使其不唯一,理想情况下是聚集索引。如果您需要查找给定的idordered by 的所有版本age,则可能需要在idthen上添加一个额外的非唯一索引age

Rollback

回滚

Finally ... Lets say you're having a bad day and the batch processing breaks. You can quickly revert to a previous data set version by running:

最后......假设您今天过得很糟糕,批处理中断了。您可以通过运行以下命令快速恢复到以前的数据集版本:

  • UPDATE table-name SET age = age - 1-- Roll back a version
  • DELETE table-name WHERE age < 0-- Clean up bad stuff
  • UPDATE table-name SET age = age - 1-- 回滚一个版本
  • DELETE table-name WHERE age < 0-- 清理坏东西

Existing Table

现有表

Suppose you have an existing table that now needs to support aging. You can use this pattern by first renaming the existing table, then add the agecolumn and indexing and then create the view that includes the age = 0condition with the same name as the original table name.

假设您有一个现在需要支持老化的现有表。您可以通过首先重命名现有表,然后添加age列和索引,然后创建包含与原始表名称相同age = 0条件的视图来使用此模式。

This strategy may or may not work depending on the nature of technology layers that depended on the original table but in many cases swapping a view for a table should drop in just fine.

根据依赖于原始表的技术层的性质,此策略可能会也可能不会起作用,但在许多情况下,将视图交换为表应该会很好。

Notes

笔记

I recommend naming the agecolumn to RowAgein order to indicate this pattern is being used, since it's clearer that its a database related value and it complements SQL Server's RowVersionnaming convention. It also won't conflict with a column or view that needs to return a person's age.

我建议将age列命名RowAge为 以指示正在使用此模式,因为它更清楚地表明它与数据库相关的值并且它补充了 SQL Server 的RowVersion命名约定。它也不会与需要返回人员年龄的列或视图发生冲突。

Unlike other solutions, this pattern works for non SQL Server databases.

与其他解决方案不同,此模式适用于非 SQL Server 数据库。

回答by AakashM

This is how I would do it. ROW_NUMBER()requires SQL Server 2005 or later

这就是我要做的。ROW_NUMBER()需要 SQL Server 2005 或更高版本

Sample data:

样本数据:

DECLARE @foo TABLE (
    id int,
    rev int,
    field nvarchar(10)
)

INSERT @foo VALUES
    ( 1, 1, 'test1' ),
    ( 2, 1, 'fdsfs' ),
    ( 3, 1, 'jfds' ),
    ( 1, 2, 'test2' )

The query:

查询:

DECLARE @desiredRev int

SET @desiredRev = 2

SELECT * FROM (
SELECT 
    id,
    rev,
    field,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rn
FROM @foo WHERE rev <= @desiredRev 
) numbered
WHERE rn = 1

The inner SELECTreturns all relevant records, and within each idgroup (that's the PARTITION BY), computes the row number when ordered by descending rev.

内部SELECT返回所有相关记录,并在每个id组(即PARTITION BY)内计算按降序排序时的行号rev

The outer SELECTjust selects the first member (so, the one with highest rev) from each idgroup.

外部SELECT只是rev从每个id组中选择第一个成员(因此,具有最高的成员)。

Output when @desiredRev = 2:

输出时@desiredRev = 2

id          rev         field      rn
----------- ----------- ---------- --------------------
1           2           test2      1
2           1           fdsfs      1
3           1           jfds       1

Output when @desiredRev = 1:

输出时@desiredRev = 1

id          rev         field      rn
----------- ----------- ---------- --------------------
1           1           test1      1
2           1           fdsfs      1
3           1           jfds       1

回答by Treb

If you want all the latest revisions of each field, you can use

如果你想要每个字段的所有最新版本,你可以使用

SELECT C.rev, C.fields FROM (
  SELECT MAX(A.rev) AS rev, A.id
  FROM yourtable A
  GROUP BY A.id) 
AS B
INNER JOIN yourtable C
ON B.id = C.id AND B.rev = C.rev

In the case of your example, that would return

就您的示例而言,这将返回

 rev field
 1   fsdfs   
 1   jfds   
 2   test2

回答by Pittsburgh DBA

SELECT
  MaxRevs.id,
  revision.field
FROM
  (SELECT
     id,
     MAX(rev) AS MaxRev
   FROM revision
   GROUP BY id
  ) MaxRevs
  INNER JOIN revision 
    ON MaxRevs.id = revision.id AND MaxRevs.MaxRev = revision.rev

回答by crimaniak

SELECT foo.* from foo 
left join foo as later 
on foo.id=later.id and later.rev>foo.rev 
where later.id is null;

回答by Joonhui Kim

How about this?

这个怎么样?

select id, max(rev), field from foo group by id

For querying specific revision e.g. revision 1,

用于查询特定修订版本,例如修订版本 1,

select id, max(rev), field from foo where rev <= 1 group by id