SQL 获取列具有最大值的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/121387/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 23:33:41  来源:igfitidea点击:

Fetch the row which has the Max value for a column

sqloraclegreatest-n-per-group

提问by Umang

Table:

桌子:

UserId, Value, Date.

I want to get the UserId, Value for the max(Date) for each UserId. That is, the Value for each UserId that has the latest date. Is there a way to do this simply in SQL? (Preferably Oracle)

我想为每个用户 ID 获取用户 ID,最大值(日期)的值。也就是说,每个具有最新日期的 UserId 的值。有没有办法在 SQL 中简单地做到这一点?(最好是甲骨文)

Update:Apologies for any ambiguity: I need to get ALL the UserIds. But for each UserId, only that row where that user has the latest date.

更新:对任何歧义表示歉意:我需要获取所有用户 ID。但是对于每个 UserId,只有该用户具有最新日期的那一行。

采纳答案by David Aldridge

This will retrieve all rows for which the my_date column value is equal to the maximum value of my_date for that userid. This may retrieve multiple rows for the userid where the maximum date is on multiple rows.

这将检索 my_date 列值等于该用户 ID 的 my_date 最大值的所有行。这可能会为 userid 检索多行,其中最大日期位于多行上。

select userid,
       my_date,
       ...
from
(
select userid,
       my_date,
       ...
       max(my_date) over (partition by userid) max_my_date
from   users
)
where my_date = max_my_date

"Analytic functions rock"

“分析函数摇滚”

Edit: With regard to the first comment ...

编辑:关于第一条评论......

"using analytic queries and a self-join defeats the purpose of analytic queries"

“使用分析查询和自联接违背了分析查询的目的”

There is no self-join in this code. There is instead a predicate placed on the result of the inline view that contains the analytic function -- a very different matter, and completely standard practice.

此代码中没有自联接。相反,在包含分析函数的内联视图的结果上放置了一个谓词——这是一个非常不同的问题,并且是完全标准的做法。

"The default window in Oracle is from the first row in the partition to the current one"

“Oracle 中的默认窗口是从分区中的第一行到当前行”

The windowing clause is only applicable in the presence of the order by clause. With no order by clause, no windowing clause is applied by default and none can be explicitly specified.

windowing 子句只适用于order by 子句的存在。没有 order by 子句,默认情况下不应用窗口子句,也不能显式指定。

The code works.

该代码有效。

回答by Bill Karwin

I see many people use subqueries or else vendor-specific features to do this, but I often do this kind of query without subqueries in the following way. It uses plain, standard SQL so it should work in any brand of RDBMS.

我看到很多人使用子查询或其他供应商特定的功能来执行此操作,但我经常通过以下方式在没有子查询的情况下执行此类查询。它使用普通的标准 SQL,因此它可以在任何品牌的 RDBMS 中工作。

SELECT t1.*
FROM mytable t1
  LEFT OUTER JOIN mytable t2
    ON (t1.UserId = t2.UserId AND t1."Date" < t2."Date")
WHERE t2.UserId IS NULL;

In other words: fetch the row from t1where no other row exists with the same UserIdand a greater Date.

换句话说:t1从不存在具有相同UserId和更大日期的其他行的位置获取行。

(I put the identifier "Date" in delimiters because it's an SQL reserved word.)

(我将标识符“Date”放在分隔符中,因为它是 SQL 保留字。)

In case if t1."Date" = t2."Date", doubling appears. Usually tables has auto_inc(seq)key, e.g. id. To avoid doubling can be used follows:

如果t1."Date" = t2."Date",则出现加倍。通常表有auto_inc(seq)键,例如id。为了避免加倍,可以使用以下方法:

SELECT t1.*
FROM mytable t1
  LEFT OUTER JOIN mytable t2
    ON t1.UserId = t2.UserId AND ((t1."Date" < t2."Date") 
         OR (t1."Date" = t2."Date" AND t1.id < t2.id))
WHERE t2.UserId IS NULL;


Re comment from @Farhan:

来自@Farhan 的评论:

Here's a more detailed explanation:

这是更详细的解释:

An outer join attempts to join t1with t2. By default, all results of t1are returned, and ifthere is a match in t2, it is also returned. If there is no match in t2for a given row of t1, then the query still returns the row of t1, and uses NULLas a placeholder for all of t2's columns. That's just how outer joins work in general.

外部连接试图加入t1t2。默认t1返回 的所有结果,如果有匹配t2,也返回。如果t2的给定行中没有匹配项t1,则查询仍返回 的行t1,并NULL用作 的所有t2列的占位符。这就是外连接的一般工作方式。

The trick in this query is to design the join's matching condition such that t2must match the sameuserid, and a greaterdate. The idea being if a row exists in t2that has a greater date, then the row in t1it's compared against can'tbe the greatest datefor that userid. But if there is no match -- i.e. if no row exists in t2with a greater datethan the row in t1-- we know that the row in t1was the row with the greatest datefor the given userid.

此查询中的技巧是设计连接的匹配条件,以便t2必须匹配相同的userid更大的date。这个想法是,如果其中存在的行t2具有更大的date,那么与t1它进行比较的行就不能是最大dateuserid。但是如果没有匹配——即如果没有比行中存在t2更大date的行t1——我们知道行中t1date给定值最大的行userid

In those cases (when there's no match), the columns of t2will be NULL-- even the columns specified in the join condition. So that's why we use WHERE t2.UserId IS NULL, because we're searching for the cases where no row was found with a greater datefor the given userid.

在这些情况下(当没有匹配项时), 的列t2将是NULL- 甚至是连接条件中指定的列。所以这就是我们使用 的原因WHERE t2.UserId IS NULL,因为我们正在搜索没有找到具有更大date给定 的行的情况userid

回答by Dave Costa

SELECT userid, MAX(value) KEEP (DENSE_RANK FIRST ORDER BY date DESC)
  FROM table
  GROUP BY userid

回答by Steve K

I don't know your exact columns names, but it would be something like this:

我不知道你确切的列名,但它会是这样的:

    select userid, value
      from users u1
     where date = (select max(date)
                     from users u2
                    where u1.userid = u2.userid)

回答by Mike Woodhouse

Not being at work, I don't have Oracle to hand, but I seem to recall that Oracle allows multiple columns to be matched in an IN clause, which should at least avoid the options that use a correlated subquery, which is seldom a good idea.

不在工作,我手头没有 Oracle,但我似乎记得 Oracle 允许在 IN 子句中匹配多个列,这至少应该避免使用相关子查询的选项,这很少是好的主意。

Something like this, perhaps (can't remember if the column list should be parenthesised or not):

可能是这样的(不记得列列表是否应该用括号括起来):

SELECT * 
FROM MyTable
WHERE (User, Date) IN
  ( SELECT User, MAX(Date) FROM MyTable GROUP BY User)

EDIT: Just tried it for real:

编辑:刚刚尝试过:

SQL> create table MyTable (usr char(1), dt date);
SQL> insert into mytable values ('A','01-JAN-2009');
SQL> insert into mytable values ('B','01-JAN-2009');
SQL> insert into mytable values ('A', '31-DEC-2008');
SQL> insert into mytable values ('B', '31-DEC-2008');
SQL> select usr, dt from mytable
  2  where (usr, dt) in 
  3  ( select usr, max(dt) from mytable group by usr)
  4  /

U DT
- ---------
A 01-JAN-09
B 01-JAN-09

So it works, although some of the new-fangly stuff mentioned elsewhere may be more performant.

所以它有效,尽管其他地方提到的一些新奇的东西可能更高效。

回答by mancaus

I know you asked for Oracle, but in SQL 2005 we now use this:

我知道你要求使用 Oracle,但在 SQL 2005 中我们现在使用这个:


-- Single Value
;WITH ByDate
AS (
SELECT UserId, Value, ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Date DESC) RowNum
FROM UserDates
)
SELECT UserId, Value
FROM ByDate
WHERE RowNum = 1

-- Multiple values where dates match
;WITH ByDate
AS (
SELECT UserId, Value, RANK() OVER (PARTITION BY UserId ORDER BY Date DESC) Rnk
FROM UserDates
)
SELECT UserId, Value
FROM ByDate
WHERE Rnk = 1

回答by user11318

I don't have Oracle to test it, but the most efficient solution is to use analytic queries. It should look something like this:

我没有 Oracle 来测试它,但最有效的解决方案是使用分析查询。它应该是这样的:

SELECT DISTINCT
    UserId
  , MaxValue
FROM (
    SELECT UserId
      , FIRST (Value) Over (
          PARTITION BY UserId
          ORDER BY Date DESC
        ) MaxValue
    FROM SomeTable
  )

I suspect that you can get rid of the outer query and put distinct on the inner, but I'm not sure. In the meantime I know this one works.

我怀疑您可以摆脱外部查询并将不同的放在内部,但我不确定。与此同时,我知道这个有效。

If you want to learn about analytic queries, I'd suggest reading http://www.orafaq.com/node/55and http://www.akadia.com/services/ora_analytic_functions.html. Here is the short summary.

如果您想了解分析查询,我建议您阅读http://www.orafaq.com/node/55http://www.akadia.com/services/ora_analytic_functions.html。这是简短的总结。

Under the hood analytic queries sort the whole dataset, then process it sequentially. As you process it you partition the dataset according to certain criteria, and then for each row looks at some window (defaults to the first value in the partition to the current row - that default is also the most efficient) and can compute values using a number of analytic functions (the list of which is very similar to the aggregate functions).

在幕后分析查询对整个数据集进行排序,然后按顺序对其进行处理。在处理它时,您根据某些标准对数据集进行分区,然后对每一行查看某个窗口(默认为当前行分区中的第一个值 - 该默认值也是最有效的)并且可以使用分析函数的数量(其列表与聚合函数非常相似)。

In this case here is what the inner query does. The whole dataset is sorted by UserId then Date DESC. Then it processes it in one pass. For each row you return the UserId and the first Date seen for that UserId (since dates are sorted DESC, that's the max date). This gives you your answer with duplicated rows. Then the outer DISTINCT squashes duplicates.

在这种情况下,这是内部查询的作用。整个数据集按 UserId 和 Date DESC 排序。然后它一次处理它。对于每一行,您返回 UserId 和为该 UserId 看到的第一个日期(由于日期按 DESC 排序,这是最大日期)。这将为您提供重复行的答案。然后外部 DISTINCT 挤压重复项。

This is not a particularly spectacular example of analytic queries. For a much bigger win consider taking a table of financial receipts and calculating for each user and receipt, a running total of what they paid. Analytic queries solve that efficiently. Other solutions are less efficient. Which is why they are part of the 2003 SQL standard. (Unfortunately Postgres doesn't have them yet. Grrr...)

这不是一个特别引人注目的分析查询示例。为了获得更大的胜利,可以考虑使用一张财务收据表并为每个用户和收据计算他们支付的总费用。分析查询有效地解决了这个问题。其他解决方案效率较低。这就是为什么它们是 2003 SQL 标准的一部分。(不幸的是 Postgres 还没有它们。Grrr...)

回答by wcw

Wouldn't a QUALIFY clause be both simplest and best?

QUALIFY 子句不是既简单又最好的吗?

select userid, my_date, ...
from users
qualify rank() over (partition by userid order by my_date desc) = 1

For context, on Teradata here a decent size test of this runs in 17s with this QUALIFY version and in 23s with the 'inline view'/Aldridge solution #1.

就上下文而言,在 Teradata 上,此 QUALIFY 版本在 17 秒内运行了一个体面的大小测试,在“内联视图”/奥尔德里奇解决方案 #1 中运行了 23 秒。

回答by Cito

With PostgreSQL 8.4 or later, you can use this:

使用 PostgreSQL 8.4 或更高版本,您可以使用它:

select user_id, user_value_1, user_value_2
  from (select user_id, user_value_1, user_value_2, row_number()
          over (partition by user_id order by user_date desc) 
        from users) as r
  where r.row_number=1

回答by Gurwinder Singh

In Oracle 12c+, you can use Top nqueries along with analytic function rankto achieve this very concisely withoutsubqueries:

在 中Oracle 12c+,您可以使用Top n查询和分析函数rank来非常简洁地实现这一点,无需子查询:

select *
from your_table
order by rank() over (partition by user_id order by my_date desc)
fetch first 1 row with ties;

The above returns all the rows with max my_date per user.

以上返回每个用户最大 my_date 的所有行。

If you want only one row with max date, then replace the rankwith row_number:

如果你想只有一排,最大日期,然后更换rankrow_number

select *
from your_table
order by row_number() over (partition by user_id order by my_date desc)
fetch first 1 row with ties;