在 MySQL 中使用 GROUP BY 选择最近的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5688063/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Select most recent row with GROUP BY in MySQL
提问by Ryan
I'm trying to select each user with their most recent payment. The query I have now selects the users first payment. I.e. if a user has made two payments and the payment.id
s are 10 and 11, the query selects the user with the info for payment id 10, not 11.
我正在尝试选择每个用户最近一次付款。我现在的查询选择用户第一次付款。即,如果用户进行了两次付款并且payment.id
s 是 10 和 11,则查询将选择具有付款 ID 信息的用户,而不是 11。
SELECT users.*, payments.method, payments.id AS payment_id
FROM `users`
LEFT JOIN `payments` ON users.id = payments.user_id
GROUP BY users.id
I've added ORDER BY payments.id
, but the query seems to ignore it and still selects the first payment.
我添加了ORDER BY payments.id
,但查询似乎忽略了它并仍然选择第一笔付款。
All help appreciated. Thanks.
所有帮助表示赞赏。谢谢。
回答by eggyal
You want the groupwise maximum; in essence, group the payments table to identify the maximal records, then join the result back with itself to fetch the other columns:
你想要分组最大值;本质上,将支付表分组以识别最大记录,然后将结果与其自身连接以获取其他列:
SELECT users.*, payments.method, payments.id AS payment_id
FROM payments NATURAL JOIN (
SELECT user_id, MAX(id) AS id
FROM payments
GROUP BY user_id
) t RIGHT JOIN users ON users.id = t.user_id
Note that MAX(id)
may not be the "most recent payment", depending on your application and schema: it's usually better to determine "most recent" based off TIMESTAMP
than based off synthetic identifiers such as an AUTO_INCREMENT
primary key column.
请注意,这MAX(id)
可能不是“最近的付款”,具体取决于您的应用程序和架构:通常根据主键列等合成标识符确定“最近的”TIMESTAMP
比基于合成标识符更好AUTO_INCREMENT
。
回答by DanH
I read the following solution on SO long ago, but I can't find the link to credit, but here goes:
很久以前我阅读了以下解决方案,但我找不到信用链接,但这里是:
SELECT users.*, payments.method, payments.id AS payment_id, payments2.id
FROM users
JOIN payments
ON users.id = payments.user_id
LEFT JOIN payments2
ON payments.user_id = payments2.user_id
AND payments.id < payments2.id
WHERE payments2.id IS NULL
To understand how this works, just drop the WHERE payments2.id IS NULL
and you'll see what is happening, for instance it could produce the following output (I haven't build the schema to test this, so it's pseudo-output). Assume there are the following records in payments
:
要了解这是如何工作的,只需删除WHERE payments2.id IS NULL
,您就会看到发生了什么,例如它可以产生以下输出(我没有构建模式来测试它,所以它是伪输出)。假设有以下记录payments
:
id | user_id | method
1 | 1 | VISA
2 | 1 | VISA
3 | 1 | VISA
4 | 1 | VISA
And the above SQL (without the WHERE payments2.id IS NULL
clause) should produce:
上面的 SQL(没有WHERE payments2.id IS NULL
子句)应该产生:
users.id | payments.method | payments.id | payments2.id
1 | VISA | 1 | 2
1 | VISA | 1 | 3
1 | VISA | 1 | 4
1 | VISA | 2 | 3
1 | VISA | 2 | 4
1 | VISA | 3 | 4
1 | VISA | 4 | NULL
As you can see the the last line produces the desired result, and since there's no payments2.id > 4
, the LEFT JOIN results in a payments2.id = NULL
.
正如您所看到的,最后一行产生了所需的结果,由于没有payments2.id > 4
,LEFT JOIN 结果是payments2.id = NULL
.
I've found this solution to be much faster (from my early tests) than the accepted answer.
我发现这个解决方案比接受的答案要快得多(从我的早期测试中)。
Using a different schema but a similar query, of 16095 records:
使用不同的模式但类似的查询,16095 条记录:
select as1.*, as2.id
from allocation_status as1
left join allocation_status as2
on as1.allocation_id = as2.allocation_id
and as1.id < as2.id
where as2.id is null;
16095 rows affected, taking 4.1ms
Compared to the accepted answer of MAX / subquery:
与 MAX / 子查询的公认答案相比:
SELECT as1.*
FROM allocation_status as1
JOIN (
SELECT max(id) as id
FROM allocation_status
group by allocation_id
) as_max on as1.id = as_max.id
16095 rows affected, taking 14.8ms
回答by Rich Harding
I've just been dealing with pretty much exactly the same problem and found these answers helpful. My testing seems to suggest you can make it slightly simpler than the accepted answer, viz.:
我刚刚处理了几乎完全相同的问题,发现这些答案很有帮助。我的测试似乎表明您可以使它比接受的答案稍微简单一些,即:
SELECT u.*, p.method, p.id AS payment_id
FROM `users` u, `payments` p
WHERE u.id = p.user_id
AND p.id = (SELECT MAX(p2.id) FROM payments p2
WHERE p2.user_id = u.id);
I've not performance tested the differences but the db I'm working on has over 50,000 Users and over 60,000 payments and the query runs in 0.024 seconds.
我没有对差异进行性能测试,但我正在处理的数据库有超过 50,000 个用户和超过 60,000 笔付款,查询运行时间为 0.024 秒。
回答by Elton da Costa
My solution:
我的解决方案:
SELECT
u.codigo,
u.nome,
max(r.latitude),
max(r.longitude),
max(r.data_criacao)
from TAB_REGISTRO_COORDENADAS r
inner join TAB_USUARIO u
on u.codigo = r.cd_usuario
group by u.codigo
回答by Rob Bailey
Taking this one step further, we can also use:
更进一步,我们还可以使用:
select payment_id, cust_id, amount, payment_method
from my_table where payment_id in
(
select max(payment_id) from my_table group by cust_id
);
...but this query is also taking way too long in my context. The inner select is smoking fast, but the outer takes a while, and with only 124 results from the inner. Ideas?
...但是这个查询在我的上下文中也花费了太长时间。内部选择吸烟速度很快,但外部选择需要一段时间,并且内部只有 124 个结果。想法?
回答by johnnyarguelles
I have come across this before. Group by's are more intended for aggregate expressions or identical records. My research found it is best practice to do something like this:
我以前遇到过这个。分组依据更适用于聚合表达式或相同的记录。我的研究发现最好的做法是做这样的事情:
SELECT u.*, p.method, p.id AS payment_id
FROM (
SELECT DISTINCT users.id
FROM users
) ur
JOIN payments p
ON p.id =
(
SELECT pt.id
FROM payments pt
WHERE pt.user_id = ur.id
ORDER BY
pt.id DESC
LIMIT 1
)