SQL join:选择一对多关系中的最后一条记录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2111384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:09:39  来源:igfitidea点击:

SQL join: selecting the last records in a one-to-many relationship

sqlselectjoinindexinggreatest-n-per-group

提问by netvope

Suppose I have a table of customers and a table of purchases. Each purchase belongs to one customer. I want to get a list of all customers along with their last purchase in one SELECT statement. What is the best practice? Any advice on building indexes?

假设我有一张顾客表和一张购买表。每次购买都属于一个客户。我想在一个 SELECT 语句中获取所有客户的列表以及他们的最后一次购买。最佳做法是什么?关于建立索引的任何建议?

Please use these table/column names in your answer:

请在您的答案中使用这些表/列名称:

  • customer: id, name
  • purchase: id, customer_id, item_id, date
  • 客户:身、姓名
  • 购买:id、customer_id、item_id、日期

And in more complicated situations, would it be (performance-wise) beneficial to denormalize the database by putting the last purchase into the customer table?

在更复杂的情况下,通过将最后一次购买放入客户表来对数据库进行非规范化是否(性能方面)有益?

If the (purchase) id is guaranteed to be sorted by date, can the statements be simplified by using something like LIMIT 1?

如果 (purchase) id 保证按日期排序,是否可以通过使用类似的东西来简化语句LIMIT 1

回答by Bill Karwin

This is an example of the greatest-n-per-groupproblem that has appeared regularly on StackOverflow.

这是greatest-n-per-groupStackOverflow 上经常出现的问题的一个例子。

Here's how I usually recommend solving it:

以下是我通常建议的解决方法:

SELECT c.*, p1.*
FROM customer c
JOIN purchase p1 ON (c.id = p1.customer_id)
LEFT OUTER JOIN purchase p2 ON (c.id = p2.customer_id AND 
    (p1.date < p2.date OR (p1.date = p2.date AND p1.id < p2.id)))
WHERE p2.id IS NULL;

Explanation: given a row p1, there should be no row p2with the same customer and a later date (or in the case of ties, a later id). When we find that to be true, then p1is the most recent purchase for that customer.

解释:给定一行p1,不应该有p2相同客户和较晚日期的行(或者在关系的情况下,稍后id)。当我们发现这是真的时,则p1是该客户的最近一次购买。

Regarding indexes, I'd create a compound index in purchaseover the columns (customer_id, date, id). That may allow the outer join to be done using a covering index. Be sure to test on your platform, because optimization is implementation-dependent. Use the features of your RDBMS to analyze the optimization plan. E.g. EXPLAINon MySQL.

关于索引,我会purchase在列 ( customer_id, date, id) 中创建一个复合索引。这可能允许使用覆盖索引完成外连接。请务必在您的平台上进行测试,因为优化取决于实现。使用 RDBMS 的功能来分析优化计划。例如EXPLAIN在 MySQL 上。



Some people use subqueries instead of the solution I show above, but I find my solution makes it easier to resolve ties.

有些人使用子查询而不是我上面展示的解决方案,但我发现我的解决方案更容易解决关系。

回答by Adriaan Stander

You could also try doing this using a sub select

您也可以尝试使用子选择来执行此操作

SELECT  c.*, p.*
FROM    customer c INNER JOIN
        (
            SELECT  customer_id,
                    MAX(date) MaxDate
            FROM    purchase
            GROUP BY customer_id
        ) MaxDates ON c.id = MaxDates.customer_id INNER JOIN
        purchase p ON   MaxDates.customer_id = p.customer_id
                    AND MaxDates.MaxDate = p.date

The select should join on all customers and their Lastpurchase date.

选择应加入所有客户及其上次购买日期。

回答by Madalina Dragomir

You haven't specified the database. If it is one that allows analytical functions it may be faster to use this approach than the GROUP BY one(definitely faster in Oracle, most likely faster in the late SQL Server editions, don't know about others).

您尚未指定数据库。如果它允许分析功能,那么使用这种方法可能比 GROUP BY 更快(在 Oracle 中肯定更快,在 SQL Server 后期版本中很可能更快,不知道其他人)。

Syntax in SQL Server would be:

SQL Server 中的语法是:

SELECT c.*, p.*
FROM customer c INNER JOIN 
     (SELECT RANK() OVER (PARTITION BY customer_id ORDER BY date DESC) r, *
             FROM purchase) p
ON (c.id = p.customer_id)
WHERE p.r = 1

回答by Stefan Haberl

Another approach would be to use a NOT EXISTScondition in your join condition to test for later purchases:

另一种方法是NOT EXISTS在您的连接条件中使用一个条件来测试以后的购买:

SELECT *
FROM customer c
LEFT JOIN purchase p ON (
       c.id = p.customer_id
   AND NOT EXISTS (
     SELECT 1 FROM purchase p1
     WHERE p1.customer_id = c.id
     AND p1.id > p.id
   )
)

回答by Mathee

I found this thread as a solution to my problem.

我发现这个线程可以解决我的问题。

But when I tried them the performance was low. Bellow is my suggestion for better performance.

但是当我尝试它们时,性能很低。波纹管是我对更好性能的建议。

With MaxDates as (
SELECT  customer_id,
                MAX(date) MaxDate
        FROM    purchase
        GROUP BY customer_id
)

SELECT  c.*, M.*
FROM    customer c INNER JOIN
        MaxDates as M ON c.id = M.customer_id 

Hope this will be helpful.

希望这会有所帮助。

回答by Tate Thurston

If you're using PostgreSQL you can use DISTINCT ONto find the first row in a group.

如果您使用的是 PostgreSQL,则可以使用它DISTINCT ON来查找组中的第一行。

SELECT customer.*, purchase.*
FROM customer
JOIN (
   SELECT DISTINCT ON (customer_id) *
   FROM purchase
   ORDER BY customer_id, date DESC
) purchase ON purchase.customer_id = customer.id

PostgreSQL Docs - Distinct On

PostgreSQL Docs - Distinct On

Note that the DISTINCT ONfield(s) -- here customer_id-- must match the left most field(s) in the ORDER BYclause.

请注意,DISTINCT ON此处的字段customer_id必须与ORDER BY子句中最左边的字段匹配。

Caveat: This is a nonstandard clause.

警告:这是一个非标准条款。

回答by Rahul Murari

Try this, It will help.

试试这个,它会有所帮助。

I have used this in my project.

我在我的项目中使用了它。

SELECT 
*
FROM
customer c
OUTER APPLY(SELECT top 1 * FROM purchase pi 
WHERE pi.customer_id = c.Id order by pi.Id desc) AS [LastPurchasePrice]

回答by Mark

Tested on SQLite:

在 SQLite 上测试:

SELECT c.*, p.*, max(p.date)
FROM customer c
LEFT OUTER JOIN purchase p
ON c.id = p.customer_id
GROUP BY c.id

The max()aggregate function will make sure that the latest purchase is selected from each group (but assumes that the date column is in a format whereby max() gives the latest - which is normally the case). If you want to handle purchases with the same date then you can use max(p.date, p.id).

max()聚合函数将确保最新的采购从每个组中选择(但假设日期列的格式,其中MAX()给出了最新的-这通常情况下)。如果您想处理具有相同日期的购买,那么您可以使用max(p.date, p.id).

In terms of indexes, I would use an index on purchase with (customer_id, date, [any other purchase columns you want to return in your select]).

在索引方面,我会在购买时使用索引(customer_id、date、[您想在选择中返回的任何其他购买列])。

The LEFT OUTER JOIN(as opposed to INNER JOIN) will make sure that customers that have never made a purchase are also included.

LEFT OUTER JOIN(相对于INNER JOIN)将确保也包括那些从未购买的客户。

回答by Milad Shahbazi

Please try this,

请试试这个,

SELECT 
c.Id,
c.name,
(SELECT pi.price FROM purchase pi WHERE pi.Id = MAX(p.Id)) AS [LastPurchasePrice]
FROM customer c INNER JOIN purchase p 
ON c.Id = p.customerId 
GROUP BY c.Id,c.name;