SQL 为多个组选择每组记录中的最新和特定版本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9430743/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Selecting most recent and specific version in each group of records, for multiple groups
提问by orange
The problem:
I have a table that records data rows in foo
. Each time the row is updated, a new row is inserted along with a revision number. The table looks like:
问题:
我有一个表,在foo
. 每次更新行时,都会插入一个新行以及修订号。该表看起来像:
id rev field
1 1 test1
2 1 fsdfs
3 1 jfds
1 2 test2
Note that in the table the last record is a newer version of the first row.
请注意,表中的最后一条记录是第一行的更新版本。
Does anyone know of an efficient way to query for the latest version of the rows, ans a specific version of records? For instance, a query for rev=2
would return the 2, 3 and 4th row (not the replaced 1st row though) while a query for rev=1
yields those rows with rev <= 1 and in case of duplicated ids, the one with the higher revision number is chosen (record: 1, 2, 3).
有谁知道查询最新版本的行和特定版本的记录的有效方法吗?例如,查询 forrev=2
将返回第 2、3 和 4 行(尽管不是替换的第一行),而查询 for 会rev=1
产生 rev <= 1 的那些行,并且在重复 ID 的情况下,具有较高修订号的行是选择(记录:1、2、3)。
I'm not actually sure if this is even possible in SQL Server...
我实际上不确定这在 SQL Server 中是否可行......
I would not prefer to return the result in an iterative way.
我不希望以迭代方式返回结果。
回答by Tim
To get only latest revisions:
仅获取最新版本:
SELECT * from t t1
WHERE t1.rev =
(SELECT max(rev) FROM t t2 WHERE t2.id = t1.id)
To get a specific revision, in this case 1 (and if an item doesn't have the revision yet the next smallest revision):
要获得特定的修订版本,在本例中为 1(如果某个项目还没有该修订版本,则为下一个最小的修订版本):
SELECT * from foo t1
WHERE t1.rev =
(SELECT max(rev)
FROM foo t2
WHERE t2.id = t1.id
AND t2.rev <= 1)
It might not be the most efficient way to do this, but right now I cannot figure a better way to do this.
这可能不是最有效的方法,但现在我想不出更好的方法来做到这一点。
回答by Tony O'Hagan
Here's an alternative solution that incurs an updatecost but is much more efficient for readingthe latest data rows as it avoids computing MAX(rev)
. It also works when you're doing bulk updates of subsets of the table. I needed this pattern to ensure I could efficiently switch to a new data set that was updated via a long running batch update without any windows of time where we had partially updated data visible.
这是一个替代解决方案,它会产生更新成本,但在读取最新数据行时效率更高,因为它避免了计算MAX(rev)
。当您对表的子集进行批量更新时,它也适用。我需要这种模式来确保我可以有效地切换到通过长时间运行的批量更新更新的新数据集,而没有任何时间窗口显示部分更新的数据可见。
Aging
老化
- Replace the
rev
column with anage
column - Create a view of the current latest data with filter:
age = 0
- To create a new version of your data ...
- INSERT: new rows with
age = -1
- This was my slow long running batch process. - UPDATE:
UPDATE table-name SET age = age + 1
for all rows in the subset. This switches the view to the new latest data (age = 0) and also ages older data in a single transaction. - DELETE: rows having
age > N
in the subset - Optionally purge old data
- INSERT: new rows with
- 用
rev
列替换age
列 - 使用过滤器创建当前最新数据的视图:
age = 0
- 要创建新版本的数据...
- INSERT: new rows with
age = -1
- 这是我缓慢的长时间运行的批处理过程。 - 更新:
UPDATE table-name SET age = age + 1
对于子集中的所有行。这将视图切换到新的最新数据(年龄 = 0),并在单个事务中老化旧数据。 - DELETE:
age > N
子集中的行- 可选地清除旧数据
- INSERT: new rows with
Indexing
索引
- Create a composite index with
age
and thenid
so the view will be nice and fast and can also be used to look up by id. Although this key is effectively unique, its temporarily non-unique when you're ageing the rows (duringUPDATE SET age=age+1
) so you'll need to make it non-unique and ideally the clustered index. If you need to find all versions of a givenid
ordered byage
, you may need an additional non-unique index onid
thenage
.
- 用
age
然后创建一个复合索引,id
这样视图就会很好而且很快,也可以用来按 id 查找。尽管此键实际上是唯一的,但当您对行进行老化时(在 期间UPDATE SET age=age+1
),它暂时不唯一,因此您需要使其不唯一,理想情况下是聚集索引。如果您需要查找给定的id
ordered by 的所有版本age
,则可能需要在id
then上添加一个额外的非唯一索引age
。
Rollback
回滚
Finally ... Lets say you're having a bad day and the batch processing breaks. You can quickly revert to a previous data set version by running:
最后......假设您今天过得很糟糕,批处理中断了。您可以通过运行以下命令快速恢复到以前的数据集版本:
UPDATE table-name SET age = age - 1
-- Roll back a versionDELETE table-name WHERE age < 0
-- Clean up bad stuff
UPDATE table-name SET age = age - 1
-- 回滚一个版本DELETE table-name WHERE age < 0
-- 清理坏东西
Existing Table
现有表
Suppose you have an existing table that now needs to support aging. You can use this pattern by first renaming the existing table, then add the age
column and indexing and then create the view that includes the age = 0
condition with the same name as the original table name.
假设您有一个现在需要支持老化的现有表。您可以通过首先重命名现有表,然后添加age
列和索引,然后创建包含与原始表名称相同的age = 0
条件的视图来使用此模式。
This strategy may or may not work depending on the nature of technology layers that depended on the original table but in many cases swapping a view for a table should drop in just fine.
根据依赖于原始表的技术层的性质,此策略可能会也可能不会起作用,但在许多情况下,将视图交换为表应该会很好。
Notes
笔记
I recommend naming the age
column to RowAge
in order to indicate this pattern is being used, since it's clearer that its a database related value and it complements SQL Server's RowVersion
naming convention. It also won't conflict with a column or view that needs to return a person's age.
我建议将age
列命名RowAge
为 以指示正在使用此模式,因为它更清楚地表明它与数据库相关的值并且它补充了 SQL Server 的RowVersion
命名约定。它也不会与需要返回人员年龄的列或视图发生冲突。
Unlike other solutions, this pattern works for non SQL Server databases.
与其他解决方案不同,此模式适用于非 SQL Server 数据库。
回答by AakashM
This is how I would do it. ROW_NUMBER()
requires SQL Server 2005 or later
这就是我要做的。ROW_NUMBER()
需要 SQL Server 2005 或更高版本
Sample data:
样本数据:
DECLARE @foo TABLE (
id int,
rev int,
field nvarchar(10)
)
INSERT @foo VALUES
( 1, 1, 'test1' ),
( 2, 1, 'fdsfs' ),
( 3, 1, 'jfds' ),
( 1, 2, 'test2' )
The query:
查询:
DECLARE @desiredRev int
SET @desiredRev = 2
SELECT * FROM (
SELECT
id,
rev,
field,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rn
FROM @foo WHERE rev <= @desiredRev
) numbered
WHERE rn = 1
The inner SELECT
returns all relevant records, and within each id
group (that's the PARTITION BY
), computes the row number when ordered by descending rev
.
内部SELECT
返回所有相关记录,并在每个id
组(即PARTITION BY
)内计算按降序排序时的行号rev
。
The outer SELECT
just selects the first member (so, the one with highest rev
) from each id
group.
外部SELECT
只是rev
从每个id
组中选择第一个成员(因此,具有最高的成员)。
Output when @desiredRev = 2
:
输出时@desiredRev = 2
:
id rev field rn
----------- ----------- ---------- --------------------
1 2 test2 1
2 1 fdsfs 1
3 1 jfds 1
Output when @desiredRev = 1
:
输出时@desiredRev = 1
:
id rev field rn
----------- ----------- ---------- --------------------
1 1 test1 1
2 1 fdsfs 1
3 1 jfds 1
回答by Treb
If you want all the latest revisions of each field, you can use
如果你想要每个字段的所有最新版本,你可以使用
SELECT C.rev, C.fields FROM (
SELECT MAX(A.rev) AS rev, A.id
FROM yourtable A
GROUP BY A.id)
AS B
INNER JOIN yourtable C
ON B.id = C.id AND B.rev = C.rev
In the case of your example, that would return
就您的示例而言,这将返回
rev field
1 fsdfs
1 jfds
2 test2
回答by Pittsburgh DBA
SELECT
MaxRevs.id,
revision.field
FROM
(SELECT
id,
MAX(rev) AS MaxRev
FROM revision
GROUP BY id
) MaxRevs
INNER JOIN revision
ON MaxRevs.id = revision.id AND MaxRevs.MaxRev = revision.rev
回答by crimaniak
SELECT foo.* from foo
left join foo as later
on foo.id=later.id and later.rev>foo.rev
where later.id is null;
回答by Joonhui Kim
How about this?
这个怎么样?
select id, max(rev), field from foo group by id
For querying specific revision e.g. revision 1,
用于查询特定修订版本,例如修订版本 1,
select id, max(rev), field from foo where rev <= 1 group by id