oracle 选择每个人的最新记录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5387903/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 23:07:45  来源:igfitidea点击:

Select the latest records per person

sqloraclegreatest-n-per-group

提问by alchn

I am just wondering, if the following two sql statements have any performance differences or are they essentially the same:

我只是想知道,以下两个sql语句是否有任何性能差异或者它们本质上是相同的:

To select a person's latest spouse record from spousetable (person_id, spouse_id, marriage_date).

spouse表(person_id、配偶_id、婚姻日期)中选择一个人的最新配偶记录。

select *
from spouse 
where (person_id, marriage_date) in ( select person_id, max(marriage_date) 
                                      from spouse  
                                      group by person_id
                                     )

select *
from spouse s1
where marriage_date = ( select max(marriage_date) 
                        from spouse s2  
                        where s1.person_id = s2.person_id  
                      )

This is a common requirement for reporting, for example, employees' latest job, highest education etc etc. I would like to know if you prefer the above statements one way over the other and why, or, if there are other better way (in terms of performance/readability) to go about these get-the-latest/maximum requirement.

这是报告的常见要求,例如,员工的最新工作、最高学历等。我想知道您是否更喜欢上述陈述的一种方式以及原因,或者,是否有其他更好的方式(在性能/可读性方面)来解决这些获得最新/最高要求的问题。

回答by Winston Smith

As already alluded to in a previous answer, you can use SQL server windowing functionsto achieve this quite nicely.

正如在之前的答案中已经提到的,您可以使用SQL 服务器窗口函数来很好地实现这一点。

SELECT s1.person_id, s1.marriage_date
FROM spouse s1
JOIN ( 
    SELECT 
        person_id, 
        ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY marriage_date DESC) AS Priority
        FROM spouse
    ) s2
ON s2.person_id = s1.person_id AND s2.Priority = 1

In the joined query, we partition the data by person_id, then apply a ranking function (ROW_NUMBER()) which assigns a number to each row, in descending order of marriage_date. The ranking is done independently for each person_id, so the join condition s2.Priority = 1means we get only the record with the max marriage_datefor each person.

在连接查询中,我们按 对数据进行分区person_id,然后应用排名函数 ( ROW_NUMBER()),该函数按 的降序为每一行分配一个数字marriage_date。每个人的排名都是独立完成的person_id,所以加入条件s2.Priority = 1意味着我们只得到marriage_date每个人最大的记录。

You might find the following useful:

您可能会发现以下内容很有用:

回答by zerkms

You can use analytic functions to do that:

您可以使用分析函数来做到这一点:

SELECT *
  FROM (SELECT ROW_NUMBER() OVER (PARTITION BY person_id
                                   ORDER BY marriage_date DESC) AS r,
               t.*
          FROM spouse t) x
 WHERE x.r = 1

And you can compare efficiency of all queries by looking at their execution plans.

您可以通过查看执行计划来比较所有查询的效率。

回答by Martin Schapendonk

Your examples are syntactically the same. As a rule of thumb, use IN when the subquery is the most restraining (i.e. eliminates a lot of records from the result set) and use a correlated subquery when the outer query is the most restraining (i.e. the subquery has more records than the total result set).

您的示例在语法上是相同的。根据经验,当子查询限制最多时(即从结果集中消除大量记录)使用 IN,当外部查询限制最多时使用相关子查询(即子查询的记录多于总数)结果集)。

Based on table statistics and estimated cost, the Oracle query optimizer might rewrite an IN clause to a correlated subquery and vice versa. Look at the explain plans for your situation and pick the plan with the least cost.

根据表统计信息和估计成本,Oracle 查询优化器可能会将 IN 子句重写为相关子查询,反之亦然。查看针对您的情况的解释计划并选择成本最低的计划。