在 mysql 中使用 group by 仅选择最后一个值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15421616/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Select only last value using group by at mysql
提问by Cito
I have one table with data about attendance into some events. I have in the table the data of the attendance everytime the user sends new attendance, the information is like this:
我有一张表格,其中包含有关参加某些活动的数据。我在表中有用户每次发送新考勤的考勤数据,信息如下:
mysql> SELECT id_branch_channel, id_member, attendance, timestamp, id_member FROM view_event_attendance WHERE id_event = 782;
+-------------------+-----------+------------+------------+-----------+
| id_branch_channel | id_member | attendance | timestamp | id_member |
+-------------------+-----------+------------+------------+-----------+
| 1326 | 131327 | 459 | 1363208604 | 131327 |
| 1326 | 131327 | 123 | 1363208504 | 131327 |
| 1326 | 131327 | 1 | 1363208459 | 131327 |
| 1326 | 93086 | 0 | NULL | 93086 |
| 1326 | 93087 | 0 | NULL | 93087 |
| 1326 | 93088 | 0 | NULL | 93088 |
| 1326 | 93093 | 0 | NULL | 93093 |
| 1326 | 99113 | 0 | NULL | 99113 |
| 1326 | 99135 | 0 | NULL | 99135 |
| 1326 | 99199 | 0 | NULL | 99199 |
| 1326 | 99200 | 0 | NULL | 99200 |
| 1326 | 131324 | 0 | NULL | 131324 |
| 1326 | 85850 | 0 | NULL | 85850 |
| 1326 | 93085 | 0 | NULL | 93085 |
+-------------------+-----------+------------+------------+-----------+
14 rows in set (0.00 sec)
(This is actually a view, for that reason some of the fields are null).
(这实际上是一个视图,因此某些字段为空)。
I can groupby id_member so I get only one row for every member (that is, only the last attendance the user sent). However, when I do it, I received the first attendance the user sent, not the last one.
我可以通过 id_member 分组,因此每个成员只能获得一行(即,只有用户发送的最后一次出席)。但是,当我这样做时,我收到了用户发送的第一个出席,而不是最后一个。
mysql> SELECT id_branch_channel, id_member, attendance, timestamp, id_member FROM view_event_attendance WHERE id_event = 782 GROUP BY id_event,id_member;
+-------------------+-----------+------------+------------+-----------+
| id_branch_channel | id_member | attendance | timestamp | id_member |
+-------------------+-----------+------------+------------+-----------+
| 1326 | 131327 | 1 | 1363208459 | 131327 |
| 1326 | 93086 | 0 | NULL | 93086 |
| 1326 | 131324 | 0 | NULL | 131324 |
| 1326 | 93087 | 0 | NULL | 93087 |
| 1326 | 93088 | 0 | NULL | 93088 |
| 1326 | 93093 | 0 | NULL | 93093 |
| 1326 | 99113 | 0 | NULL | 99113 |
| 1326 | 99135 | 0 | NULL | 99135 |
| 1326 | 85850 | 0 | NULL | 85850 |
| 1326 | 99199 | 0 | NULL | 99199 |
| 1326 | 93085 | 0 | NULL | 93085 |
| 1326 | 99200 | 0 | NULL | 99200 |
+-------------------+-----------+------------+------------+-----------+
12 rows in set (0.00 sec)
I already tried to add ORDER BY clausules, but they are not working at all... any ideas?
我已经尝试添加 ORDER BY clausules,但它们根本不起作用......有什么想法吗?
Thanks in advance!
提前致谢!
Edit: this is the script that creates the table
编辑:这是创建表的脚本
CREATE OR REPLACE VIEW view_event_attendance
AS
SELECT
tbl_event.id_event,
tbl_member_event.id_member,
tbl_event.id_branch_channel,
tbl_member_event_attendance.id_member_event_attendance,
IF(ISNULL(tbl_member_event_attendance.attendance), 0, tbl_member_event_attendance.attendance) AS attendance,
tbl_member_event_attendance.timestamp
FROM
tbl_event
INNER JOIN
tbl_member_event ON tbl_member_event.id_event = tbl_event.id_event
LEFT OUTER JOIN
tbl_member_event_attendance ON tbl_member_event_attendance.id_member_event = tbl_member_event.id_member_event
ORDER BY
tbl_member_event_attendance.timestamp DESC;
EDIT 2:
编辑2:
Thanks a lot MichaelBenjamin, but the problem when using subqueries is the size of the view:
非常感谢 MichaelBenjamin,但是使用子查询时的问题是视图的大小:
mysql> DESCRIBE SELECT id_branch_channel, id_member, attendance, timestamp, id_member
-> FROM (select * from view_event_attendance order by timestamp desc) as whatever
-> WHERE id_event = 782
-> GROUP BY id_event,id_member;
+----+-------------+-----------------------------+--------+-----------------+-----------------+---------+------------------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------------------+--------+-----------------+-----------------+---------+------------------------------------------------+-------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 16755 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | tbl_member_event | index | id_event | id_event | 8 | NULL | 16346 | Using index; Using temporary; Using filesort |
| 2 | DERIVED | tbl_event | eq_ref | PRIMARY | PRIMARY | 4 | video_staging.tbl_member_event.id_event | 1 | |
| 2 | DERIVED | tbl_member_event_attendance | ref | id_event_member | id_event_member | 4 | video_staging.tbl_member_event.id_member_event | 1 | Using index |
+----+-------------+-----------------------------+--------+-----------------+-----------------+---------+------------------------------------------------+-------+----------------------------------------------+
4 rows in set (0.08 sec)
As you can see there are a lot of rows in my table, so for that reason I don't want to use subqueries...
如您所见,我的表中有很多行,因此我不想使用子查询...
EDIT 3:
编辑 3:
But adding WHERE to the subquery it looks better...
但是将 WHERE 添加到子查询中看起来更好......
mysql> DESCRIBE SELECT id_branch_channel, id_member, attendance, timestamp, id_member
-> FROM (select * from view_event_attendance where id_event = 782 order by timestamp desc) as whatever
-> WHERE id_event = 782
-> GROUP BY id_event,id_member;
+----+-------------+-----------------------------+-------+-----------------+-----------------+---------+------------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------------------+-------+-----------------+-----------------+---------+------------------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 14 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | tbl_event | const | PRIMARY | PRIMARY | 4 | | 1 | Using temporary; Using filesort |
| 2 | DERIVED | tbl_member_event | ref | id_event | id_event | 4 | | 12 | Using index |
| 2 | DERIVED | tbl_member_event_attendance | ref | id_event_member | id_event_member | 4 | video_staging.tbl_member_event.id_member_event | 1 | Using index |
+----+-------------+-----------------------------+-------+-----------------+-----------------+---------+------------------------------------------------+------+----------------------------------------------+
4 rows in set (0.01 sec)
If I can not find anything else not using subqueries, I think I'll choose this as the answer...
如果我找不到任何不使用子查询的东西,我想我会选择这个作为答案......
Edit 4
编辑 4
After seeing the comments in the answer, I've decided to select another as the answer. Here is the DESCRIBE for both queries, and I think it is obvious what is the best solution:
在看到答案中的评论后,我决定选择另一个作为答案。这是两个查询的 DESCRIBE,我认为最好的解决方案是显而易见的:
mysql> DESCRIBE SELECT
-> id_branch_channel,
-> id_member,
-> attendance,
-> timestamp,
-> id_member
-> FROM view_event_attendance AS t1
-> WHERE id_event = 782
-> AND timestamp = (SELECT MAX(timestamp)
-> FROM view_event_attendance AS t2
-> WHERE t1.id_member = t2.id_member
-> AND t1.id_event = t2.id_event
-> GROUP BY id_event, id_member)
-> OR timestamp IS NULL
-> GROUP BY id_event, id_member;
+----+--------------------+-----------------------------+--------+--------------------+--------------------------+---------+------------------------------------------------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------------------------+--------+--------------------+--------------------------+---------+------------------------------------------------+------+-----------------------------------------------------------+
| 1 | PRIMARY | tbl_event | index | PRIMARY | id_member_branch_channel | 4 | NULL | 208 | Using index; Using temporary; Using filesort |
| 1 | PRIMARY | tbl_member_event | ref | id_event | id_event | 4 | video_staging.tbl_event.id_event | 64 | Using index |
| 1 | PRIMARY | tbl_member_event_attendance | ref | id_event_member | id_event_member | 4 | video_staging.tbl_member_event.id_member_event | 1 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | tbl_event | eq_ref | PRIMARY | PRIMARY | 4 | func | 1 | Using where; Using index; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | tbl_member_event | eq_ref | id_event,id_member | id_event | 8 | video_staging.tbl_event.id_event,func | 1 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | tbl_member_event_attendance | ref | id_event_member | id_event_member | 4 | video_staging.tbl_member_event.id_member_event | 1 | Using where; Using index |
+----+--------------------+-----------------------------+--------+--------------------+--------------------------+---------+------------------------------------------------+------+-----------------------------------------------------------+
6 rows in set (0.00 sec)
mysql> DESCRIBE SELECT *
-> FROM (SELECT id_branch_channel, id_member, attendance, timestamp, id_event
-> FROM view_event_attendance
-> WHERE id_event = 782
-> ORDER BY timestamp desc
-> ) as whatever
-> GROUP BY id_event,id_member;
+----+-------------+-----------------------------+-------+-----------------+-----------------+---------+------------------------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------------------+-------+-----------------+-----------------+---------+------------------------------------------------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 14 | Using temporary; Using filesort |
| 2 | DERIVED | tbl_event | const | PRIMARY | PRIMARY | 4 | | 1 | Using temporary; Using filesort |
| 2 | DERIVED | tbl_member_event | ref | id_event | id_event | 4 | | 12 | Using index |
| 2 | DERIVED | tbl_member_event_attendance | ref | id_event_member | id_event_member | 4 | video_staging.tbl_member_event.id_member_event | 1 | Using index |
+----+-------------+-----------------------------+-------+-----------------+-----------------+---------+------------------------------------------------+------+---------------------------------+
4 rows in set (0.00 sec)
采纳答案by ysth
Use a simple group by id_member, but select:
使用 id_member 的简单组,但选择:
substring(max(concat(from_unixtime(timestamp),attendance)) from 20) as attendance
This attaches attendance to the timestamp for each row in a group, in order to be able to select the desired timestamp/attendance with max() and then extract just the attendance.
这将出勤附加到组中每一行的时间戳,以便能够使用 max() 选择所需的时间戳/出勤,然后仅提取出勤。
What concat()
returns is 19 characters of formatted timestamp (YYYY-mm-dd HH:MM:SS) with the attendance appended starting at character 20; the substring(... from 20)
gets just the attendance from the (stringwise) maximum one for the group. You can remove the group by and just
什么concat()
回报是格式化的时间戳记的19个字符(YYYY-MM-DD HH:MM:SS)与所述出席所附起始于字符20; 在substring(... from 20)
获得距离(stringwise)的最大一个该组的上座率。您可以通过以下方式删除组
select concat(from_unixtime(timestamp),attendance), timestamp, attendance
to get a better idea of how it uses max to get the right attendance.
更好地了解它如何使用 max 来获得正确的出勤率。
回答by Sarang
SUBSTRING_INDEX(SUBSTRING_INDEX(group_concat(%requiredfield%), ',', count(*)),',',-1)
This will get the last value of the 'required field' from any group_concat, if unsorted it will be the last value in the table by default.
这将从任何 group_concat 获取“必填字段”的最后一个值,如果未排序,默认情况下它将是表中的最后一个值。
Could use group_concat_ws to account for possible null fields.
可以使用 group_concat_ws 来说明可能的空字段。
回答by Michael Benjamin
SELECT id_branch_channel, id_member, attendance, timestamp, id_member
FROM (select * from view_event_attendance order by timestamp desc) as whatever
WHERE id_event = 782
GROUP BY id_event,id_member;
EDIT:This may yield better performance:
编辑:这可能会产生更好的性能:
SELECT *
FROM (SELECT id_branch_channel, id_member, attendance, timestamp, id_member
FROM view_event_attendance
WHERE id_event = 782
ORDER BY timestamp desc
) as whatever
GROUP BY id_event,id_member;
As long as the result-set can fit into the Innodb_buffer_pool, you will not see a significant performance drop.
只要结果集可以放入 Innodb_buffer_pool,您就不会看到显着的性能下降。
回答by JodyT
I see answers with JOINS
and Subquerys
, but I believe a simple HAVING
clause should do the trick:
我用JOINS
and看到答案Subquerys
,但我相信一个简单的HAVING
子句应该可以解决问题:
SELECT
id_branch_channel,
id_member,
attendance,
timestamp,
id_member
FROM view_event_attendance
WHERE id_event = 782
GROUP BY id_event, id_member
HAVING MAX(timestamp) OR timestamp IS NULL;
EDIT:Added a check for IS NULL if you also want to include those rows.
编辑:如果您还想包含这些行,则添加了对 IS NULL 的检查。
EDIT 2:Is it even needed to group by id_event when you're already filtering it to 1 event?
编辑 2:当您已经将其过滤为 1 个事件时,是否还需要按 id_event 进行分组?
EDIT 3:Don't know why the downvote, this sql fiddleshows it works.
编辑3:不知道为什么downvote,这个sql fiddle显示它有效。
EDIT 4:I have to apologise, @ysth is correct, the SQL Fiddle does not work correctly. I deserved the -1, but when you down vote at least explain why so I can learn something myself as well.
编辑 4:我必须道歉,@ysth 是正确的,SQL Fiddle 无法正常工作。我应得的 -1,但是当你投反对票时,至少解释一下原因,这样我自己也可以学到一些东西。
The following works, but unfortunately it has a subquery again and won't perform much better than the other solutions posted here.
以下工作,但不幸的是它再次有一个子查询,并且不会比这里发布的其他解决方案好得多。
SELECT
id_branch_channel,
id_member,
attendance,
timestamp,
id_member
FROM view_event_attendance AS t1
WHERE id_event = 782
AND timestamp = (SELECT MAX(timestamp)
FROM view_event_attendance AS t2
WHERE t1.id_member = t2.id_member
AND t1.id_event = t2.id_event
GROUP BY id_event, id_member)
OR timestamp IS NULL
GROUP BY id_event, id_member;
回答by sgeddes
Here is one option (untested):
这是一种选择(未经测试):
SELECT v.id_branch_channel, v.id_member, v.attendance, v.timestamp, v.id_member
FROM view_event_attendance v
JOIN (
SELECT id_event, id_member, MAX(attendance) maxattendance
FROM view_event_attendance
GROUP BY id_event, id_member ) m ON
v.id_event = m.id_event AND
v.id_member = m.id_member AND
v.attendance = m.maxattendance
WHERE v.id_event = 782
GROUP BY v.id_member;
The concept is to get the MAX()
of timestamp and use that field to JOIN
on your view. You might not need all the fields -- really depends on your table structure. But this should get you going in the correct direction.
这个概念是获取MAX()
时间戳并将该字段用于JOIN
您的视图。您可能不需要所有字段——实际上取决于您的表结构。但这应该让你朝着正确的方向前进。
回答by Steve
One way to do this is to use a window function and a subquery, if you add an entry to your select list as row_number() over (partition by id_member order by timestamp desc)
this will resolve to a number ordering the rows by timestamp (with 1 being the oldest) grouped in each id_member group (run it if this doesn't make sense, it will be clear). You can then select from this as a subquery where the extra column = 1 which will only select the rows with the highest timestamp within each group.
执行此操作的一种方法是使用窗口函数和子查询,如果您将条目添加到您的选择列表,因为row_number() over (partition by id_member order by timestamp desc)
这将解析为按时间戳对行进行排序的数字(1 是最旧的)分组在每个 id_member 组中(运行如果这没有意义,那就很清楚了)。然后,您可以从中选择作为子查询,其中额外的列 = 1,这将只选择每个组中时间戳最高的行。