MySQL - 控制组返回哪一行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/537223/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MySQL - Control which row is returned by a group by
提问by benlumley
I have a database table like this:
我有一个这样的数据库表:
id version_id field1 field2
1 1 texta text1
1 2 textb text2
2 1 textc text3
2 2 textd text4
2 3 texte text5
If you didn't work it out, it contains a number of versions of a row, and then some text data.
如果您没有解决,它包含一行的多个版本,然后是一些文本数据。
I want to query it and return the version with the highest number for each id. (so the second and last rows only in the above).
我想查询它并返回每个 id 编号最高的版本。(所以只在上面的第二行和最后一行)。
I've tried using group by whilst ordering by version_id DESC - but it seems to order after its grouped, so this doesn't work.
我试过在按 version_id DESC 排序的同时使用 group by - 但它似乎在分组后排序,所以这不起作用。
Anyone got any ideas? I can't believe it can't be done!
有人有任何想法吗?我不敢相信这做不到!
UPDATE:
更新:
Come up with this, which works, but uses a subquery:
想出这个,它有效,但使用子查询:
SELECT *
FROM (SELECT * FROM table ORDER BY version_id DESC) t1
GROUP BY t1.id
采纳答案by ???u
It's called selecting the group-wise maximum of a column. Here are several different approaches for mysql.
这称为选择列的分组最大值。以下是 mysql 的几种不同方法。
Here's how I would do it:
这是我将如何做到的:
SELECT *
FROM (SELECT id, max(version_id) as version_id FROM table GROUP BY id) t1
INNER JOIN table t2 on t2.id=t1.id and t1.version_id=t2.version_id
This will be relatively efficient, though mysql will create a temporary table in memory for the subquery. I assume you already have an index on (id, version_id) for this table.
这将是相对有效的,尽管 mysql 会在内存中为子查询创建一个临时表。我假设您已经有了这个表的 (id, version_id) 索引。
It's a deficiency in SQL that you more or less have to use a subquery for this type of problem (semi-joinsare another example).
SQL 中的一个缺陷是,您或多或少必须对此类问题使用子查询(半连接是另一个示例)。
Subqueries are not well optimized in mysql but uncorrelated subqueries aren't so bad as long as they aren't so enormous that they will get written to disk rather than memory. Given that in this query only has two ints the subquery could be millions of rows long before that happened but the select * subquery in your first query could suffer from this problem much sooner.
子查询在 mysql 中没有得到很好的优化,但不相关的子查询并没有那么糟糕,只要它们不是太大以至于它们会被写入磁盘而不是内存。鉴于在此查询中只有两个整数,子查询可能在此之前很久就有数百万行,但您的第一个查询中的 select * 子查询可能会更快地遇到此问题。
回答by Chris J
I think this would do it, not sure if it is the best or fastest though.
我认为这会做到,但不确定它是最好的还是最快的。
SELECT * FROM table
WHERE (id, version_id) IN
(SELECT id, MAX(version_id) FROM table GROUP BY id)
回答by Quassnoi
SELECT id, version_id, field1, field2
FROM (
SELECT @prev = id AS st, (@prev := id), m.*
FROM (
(SELECT @prev := NULL) p,
(
SELECT *
FROM mytable
ORDER BY
id DESC, version_id DESC
) m
) m2
WHERE NOT IFNULL(st, FALSE);
No subqueries, one pass on UNIQUE INDEX ON MYTABLE (id, version_id)
if you have one (which I think you should)
没有子查询,UNIQUE INDEX ON MYTABLE (id, version_id)
如果你有一个(我认为你应该)
回答by Patrick Savalle
This query will do the job without a group by:
此查询将在没有组的情况下通过以下方式完成工作:
SELECT * FROM table AS t
LEFT JOIN table AS t2
ON t.id=t2.id
AND t.version_id < t2.version_id
WHERE t2.id IS NULL
It does not need any temporary tables.
它不需要任何临时表。
回答by sumit kumar
One can always go for analytical functions as well which will give you more control
人们也可以随时使用分析功能,这会给您更多的控制
select tmp.* from ( select id,version_id,field1,field2, rank() over(partition by id order by version_id desc ) as rnk from table) tmp where tmp.rnk=1
select tmp.* from ( select id,version_id,field1,field2, rank() over(partition by id order by version_id desc ) as rnk from table) tmp where tmp.rnk=1
If you face issue with rank() function depending on the type of data then one can choose from row_number() or dense_rank() too.
如果您根据数据类型遇到 rank() 函数的问题,那么您也可以从 row_number() 或 density_rank() 中进行选择。
回答by Chris Meek
This is pseudo code but something like this should work just fine
这是伪代码,但像这样的东西应该可以正常工作
select *
from table
inner join
(
select id , max(version_id) maxVersion
from table
) dvtbl ON id = dvtbl.id && versionid = dvtbl.maxVersion
回答by mcassano
I usually do this with a subquery:
我通常用一个子查询来做到这一点:
select id, version_id, field1, field2 from datatable as dt where id = (select id from datatable where id = dt.id order by version_id desc limit 1)
select id, version_id, field1, field2 from datatable as dt where id = (select id from datatable where id = dt.id order by version_id desc limit 1)
回答by Berek Bryan
I think this is what you want.
我想这就是你想要的。
select id, max(v_id), field1, field2 from table group by id
The results I get from that are
我从中得到的结果是
1, 2, textb, text2
2, 3, texte, text5
1, 2, textb, text2
2, 3, texte, text5
Edit:I recreated the table and insert the same data with the id an version_id being a compound primary key. This gave the answer I provided earlier. It was also in MySQL.
编辑:我重新创建了表并插入了相同的数据,id 和 version_id 是复合主键。这给出了我之前提供的答案。它也在 MySQL 中。
回答by Berek Bryan
not tested it but something like this might work:
没有测试过,但这样的事情可能会奏效:
SELECT * FROM table GROUP BY id ORDER BY MAX(version_id) DESC
SELECT * FROM table GROUP BY id ORDER BY MAX(version_id) DESC