SQL join:选择一对多关系中的最后一条记录
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2111384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL join: selecting the last records in a one-to-many relationship
提问by netvope
Suppose I have a table of customers and a table of purchases. Each purchase belongs to one customer. I want to get a list of all customers along with their last purchase in one SELECT statement. What is the best practice? Any advice on building indexes?
假设我有一张顾客表和一张购买表。每次购买都属于一个客户。我想在一个 SELECT 语句中获取所有客户的列表以及他们的最后一次购买。最佳做法是什么?关于建立索引的任何建议?
Please use these table/column names in your answer:
请在您的答案中使用这些表/列名称:
- customer: id, name
- purchase: id, customer_id, item_id, date
- 客户:身、姓名
- 购买:id、customer_id、item_id、日期
And in more complicated situations, would it be (performance-wise) beneficial to denormalize the database by putting the last purchase into the customer table?
在更复杂的情况下,通过将最后一次购买放入客户表来对数据库进行非规范化是否(性能方面)有益?
If the (purchase) id is guaranteed to be sorted by date, can the statements be simplified by using something like LIMIT 1
?
如果 (purchase) id 保证按日期排序,是否可以通过使用类似的东西来简化语句LIMIT 1
?
回答by Bill Karwin
This is an example of the greatest-n-per-group
problem that has appeared regularly on StackOverflow.
这是greatest-n-per-group
StackOverflow 上经常出现的问题的一个例子。
Here's how I usually recommend solving it:
以下是我通常建议的解决方法:
SELECT c.*, p1.*
FROM customer c
JOIN purchase p1 ON (c.id = p1.customer_id)
LEFT OUTER JOIN purchase p2 ON (c.id = p2.customer_id AND
(p1.date < p2.date OR (p1.date = p2.date AND p1.id < p2.id)))
WHERE p2.id IS NULL;
Explanation: given a row p1
, there should be no row p2
with the same customer and a later date (or in the case of ties, a later id
). When we find that to be true, then p1
is the most recent purchase for that customer.
解释:给定一行p1
,不应该有p2
相同客户和较晚日期的行(或者在关系的情况下,稍后id
)。当我们发现这是真的时,则p1
是该客户的最近一次购买。
Regarding indexes, I'd create a compound index in purchase
over the columns (customer_id
, date
, id
). That may allow the outer join to be done using a covering index. Be sure to test on your platform, because optimization is implementation-dependent. Use the features of your RDBMS to analyze the optimization plan. E.g. EXPLAIN
on MySQL.
关于索引,我会purchase
在列 ( customer_id
, date
, id
) 中创建一个复合索引。这可能允许使用覆盖索引完成外连接。请务必在您的平台上进行测试,因为优化取决于实现。使用 RDBMS 的功能来分析优化计划。例如EXPLAIN
在 MySQL 上。
Some people use subqueries instead of the solution I show above, but I find my solution makes it easier to resolve ties.
有些人使用子查询而不是我上面展示的解决方案,但我发现我的解决方案更容易解决关系。
回答by Adriaan Stander
You could also try doing this using a sub select
您也可以尝试使用子选择来执行此操作
SELECT c.*, p.*
FROM customer c INNER JOIN
(
SELECT customer_id,
MAX(date) MaxDate
FROM purchase
GROUP BY customer_id
) MaxDates ON c.id = MaxDates.customer_id INNER JOIN
purchase p ON MaxDates.customer_id = p.customer_id
AND MaxDates.MaxDate = p.date
The select should join on all customers and their Lastpurchase date.
选择应加入所有客户及其上次购买日期。
回答by Madalina Dragomir
You haven't specified the database. If it is one that allows analytical functions it may be faster to use this approach than the GROUP BY one(definitely faster in Oracle, most likely faster in the late SQL Server editions, don't know about others).
您尚未指定数据库。如果它允许分析功能,那么使用这种方法可能比 GROUP BY 更快(在 Oracle 中肯定更快,在 SQL Server 后期版本中很可能更快,不知道其他人)。
Syntax in SQL Server would be:
SQL Server 中的语法是:
SELECT c.*, p.*
FROM customer c INNER JOIN
(SELECT RANK() OVER (PARTITION BY customer_id ORDER BY date DESC) r, *
FROM purchase) p
ON (c.id = p.customer_id)
WHERE p.r = 1
回答by Stefan Haberl
Another approach would be to use a NOT EXISTS
condition in your join condition to test for later purchases:
另一种方法是NOT EXISTS
在您的连接条件中使用一个条件来测试以后的购买:
SELECT *
FROM customer c
LEFT JOIN purchase p ON (
c.id = p.customer_id
AND NOT EXISTS (
SELECT 1 FROM purchase p1
WHERE p1.customer_id = c.id
AND p1.id > p.id
)
)
回答by Mathee
I found this thread as a solution to my problem.
我发现这个线程可以解决我的问题。
But when I tried them the performance was low. Bellow is my suggestion for better performance.
但是当我尝试它们时,性能很低。波纹管是我对更好性能的建议。
With MaxDates as (
SELECT customer_id,
MAX(date) MaxDate
FROM purchase
GROUP BY customer_id
)
SELECT c.*, M.*
FROM customer c INNER JOIN
MaxDates as M ON c.id = M.customer_id
Hope this will be helpful.
希望这会有所帮助。
回答by Tate Thurston
If you're using PostgreSQL you can use DISTINCT ON
to find the first row in a group.
如果您使用的是 PostgreSQL,则可以使用它DISTINCT ON
来查找组中的第一行。
SELECT customer.*, purchase.*
FROM customer
JOIN (
SELECT DISTINCT ON (customer_id) *
FROM purchase
ORDER BY customer_id, date DESC
) purchase ON purchase.customer_id = customer.id
Note that the DISTINCT ON
field(s) -- here customer_id
-- must match the left most field(s) in the ORDER BY
clause.
请注意,DISTINCT ON
此处的字段customer_id
必须与ORDER BY
子句中最左边的字段匹配。
Caveat: This is a nonstandard clause.
警告:这是一个非标准条款。
回答by Rahul Murari
Try this, It will help.
试试这个,它会有所帮助。
I have used this in my project.
我在我的项目中使用了它。
SELECT
*
FROM
customer c
OUTER APPLY(SELECT top 1 * FROM purchase pi
WHERE pi.customer_id = c.Id order by pi.Id desc) AS [LastPurchasePrice]
回答by Mark
Tested on SQLite:
在 SQLite 上测试:
SELECT c.*, p.*, max(p.date)
FROM customer c
LEFT OUTER JOIN purchase p
ON c.id = p.customer_id
GROUP BY c.id
The max()
aggregate function will make sure that the latest purchase is selected from each group (but assumes that the date column is in a format whereby max() gives the latest - which is normally the case). If you want to handle purchases with the same date then you can use max(p.date, p.id)
.
该max()
聚合函数将确保最新的采购从每个组中选择(但假设日期列的格式,其中MAX()给出了最新的-这通常情况下)。如果您想处理具有相同日期的购买,那么您可以使用max(p.date, p.id)
.
In terms of indexes, I would use an index on purchase with (customer_id, date, [any other purchase columns you want to return in your select]).
在索引方面,我会在购买时使用索引(customer_id、date、[您想在选择中返回的任何其他购买列])。
The LEFT OUTER JOIN
(as opposed to INNER JOIN
) will make sure that customers that have never made a purchase are also included.
在LEFT OUTER JOIN
(相对于INNER JOIN
)将确保也包括那些从未购买的客户。
回答by Milad Shahbazi
Please try this,
请试试这个,
SELECT
c.Id,
c.name,
(SELECT pi.price FROM purchase pi WHERE pi.Id = MAX(p.Id)) AS [LastPurchasePrice]
FROM customer c INNER JOIN purchase p
ON c.Id = p.customerId
GROUP BY c.Id,c.name;