oracle 如何在oracle中选择最后一次出现的重复记录
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17105792/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to select last occurrence of duplicating record in oracle
提问by Jonathan Handoyo
I am having a problem with Oracle query where the basic goal is to get the last row of every re-occurring rows, but there's a complication that you'll understand from the data:
我在使用 Oracle 查询时遇到问题,其基本目标是获取每个重复出现的行的最后一行,但是您会从数据中了解到一个复杂的问题:
Suppose I have a table that looks like this:
假设我有一个看起来像这样的表:
ID | COL1 | COL2 | COL3 | UPDATED_DATE
------|------|------|------|-------------
001 | a | b | c | 14/05/2013
002 | a | b | c | 16/05/2013
003 | a | b | c | 12/05/2013
You should be able to guess that since columns 1 to 3 have the same values for all 3 rows they are re-occurring data. The problem is, I want to get the latest updated row, which means row #2.
您应该能够猜到,由于第 1 到第 3 列的所有 3 行都具有相同的值,因此它们是重复出现的数据。问题是,我想获得最新更新的行,这意味着第 2 行。
I have an existing query that works if the table is without ID column, but I still need that column, so if anybody could help me point out what I'm doing wrong, that'd be great.
如果表没有 ID 列,我有一个现有的查询,但我仍然需要该列,所以如果有人能帮我指出我做错了什么,那就太好了。
select col1,
col2,
col3,
max(updated_date)
from tbl
order by col1, col2, col3;
The above query returns me row #2, which is correct, but I still need the ID.
上面的查询返回第 2 行,这是正确的,但我仍然需要 ID。
Note: I know that I could have encapsulate the above query with another query that selects the ID column based on the 4 columns, but since I'm dealing with millions of records, the re-query will make the app very ineffective.
注意:我知道我可以用另一个查询封装上述查询,该查询基于 4 列选择 ID 列,但由于我正在处理数百万条记录,重新查询将使应用程序非常无效。
回答by peterm
Try
尝试
WITH qry AS
(
SELECT ID, COL1, COL2, COL3, updated_date,
ROW_NUMBER() OVER (PARTITION BY COL1, COL2, COL3 ORDER BY updated_date DESC) rank
FROM tbl
)
SELECT ID, COL1, COL2, COL3, updated_date
FROM qry
WHERE rank = 1
or
或者
SELECT t1.ID, t2.COL1, t2.COL2, t2.COL3, t2.updated_date
FROM tbl t1 JOIN
(
SELECT COL1, COL2, COL3, MAX(updated_date) updated_date
FROM tbl
GROUP BY COL1, COL2, COL3
) t2 ON t1.COL1 = t2.COL1
AND t1.COL2 = t2.COL2
AND t1.COL3 = t2.COL3
AND t1.updated_date = t2.updated_date
Output in both cases:
两种情况下的输出:
| ID | COL1 | COL2 | COL3 | UPDATED_DATE | -------------------------------------------------------- | 2 | a | b | c | May, 16 2013 00:00:00+0000 |
Here is SQLFiddledemo for both queries.
这是两个查询的SQLFiddle演示。