MySQL 在 GROUP BY 中使用 LIMIT 以获得每组 N 个结果?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2129693/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using LIMIT within GROUP BY to get N results per group?
提问by Wells
The following query:
以下查询:
SELECT
year, id, rate
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC
yields:
产量:
year id rate
2006 p01 8
2003 p01 7.4
2008 p01 6.8
2001 p01 5.9
2007 p01 5.3
2009 p01 4.4
2002 p01 3.9
2004 p01 3.5
2005 p01 2.1
2000 p01 0.8
2001 p02 12.5
2004 p02 12.4
2002 p02 12.2
2003 p02 10.3
2000 p02 8.7
2006 p02 4.6
2007 p02 3.3
What I'd like is only the top 5 results for each id:
我想要的只是每个 id 的前 5 个结果:
2006 p01 8
2003 p01 7.4
2008 p01 6.8
2001 p01 5.9
2007 p01 5.3
2001 p02 12.5
2004 p02 12.4
2002 p02 12.2
2003 p02 10.3
2000 p02 8.7
Is there a way to do this using some kind of LIMIT like modifier that works within the GROUP BY?
有没有办法使用某种在 GROUP BY 中工作的类似 LIMIT 的修饰符来做到这一点?
回答by fthiella
You could use GROUP_CONCATaggregated function to get all years into a single column, grouped by id
and ordered by rate
:
您可以使用GROUP_CONCAT聚合函数将所有年份放入一个列中,按以下分组id
和排序rate
:
SELECT id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM yourtable
GROUP BY id
Result:
结果:
-----------------------------------------------------------
| ID | GROUPED_YEAR |
-----------------------------------------------------------
| p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 |
| p02 | 2001,2004,2002,2003,2000,2006,2007 |
-----------------------------------------------------------
And then you could use FIND_IN_SET, that returns the position of the first argument inside the second one, eg.
然后你可以使用FIND_IN_SET,它返回第二个参数中第一个参数的位置,例如。
SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
1
SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
6
Using a combination of GROUP_CONCAT
and FIND_IN_SET
, and filtering by the position returned by find_in_set, you could then use this query that returns only the first 5 years for every id:
使用GROUP_CONCAT
and的组合,并按FIND_IN_SET
find_in_set 返回的位置进行过滤,然后您可以使用此查询,该查询仅返回每个 id 的前 5 年:
SELECT
yourtable.*
FROM
yourtable INNER JOIN (
SELECT
id,
GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM
yourtable
GROUP BY id) group_max
ON yourtable.id = group_max.id
AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5
ORDER BY
yourtable.id, yourtable.year DESC;
Please see fiddle here.
请在此处查看小提琴。
Please note that if more than one row can have the same rate, you should consider using GROUP_CONCAT(DISTINCT rate ORDER BY rate) on the rate column instead of the year column.
请注意,如果不止一行可以具有相同的费率,您应该考虑在费率列而不是年份列上使用 GROUP_CONCAT(DISTINCT rate ORDER BY rate)。
The maximum length of the string returned by GROUP_CONCAT is limited, so this works well if you need to select a few records for every group.
GROUP_CONCAT 返回的字符串的最大长度是有限的,因此如果您需要为每个组选择几条记录,这很有效。
回答by Salman A
The original queryused user variables and ORDER BY
on derived tables; the behavior of both quirks is not guaranteed. Revised answer as follows.
所述原始查询使用的用户变量和ORDER BY
上派生表; 不能保证这两种怪癖的行为。修改答案如下。
In MySQL 5.x you can use poor man's rank over partition to achieve desired result. Just outer join the table with itself and for each row, count the number of rows lesserthan it. In the above case, lesser row is the one with higher rate:
在 MySQL 5.x 中,您可以使用穷人的分区排名来达到预期的结果。只需将表与自身外部连接,对于每一行,计算比它少的行数。在上述情况下,较小的行是具有较高比率的行:
SELECT t.id, t.rate, t.year, COUNT(l.rate) AS rank
FROM t
LEFT JOIN t AS l ON t.id = l.id AND t.rate < l.rate
GROUP BY t.id, t.rate, t.year
HAVING COUNT(l.rate) < 5
ORDER BY t.id, t.rate DESC, t.year
| id | rate | year | rank |
|-----|------|------|------|
| p01 | 8.0 | 2006 | 0 |
| p01 | 7.4 | 2003 | 1 |
| p01 | 6.8 | 2008 | 2 |
| p01 | 5.9 | 2001 | 3 |
| p01 | 5.3 | 2007 | 4 |
| p02 | 12.5 | 2001 | 0 |
| p02 | 12.4 | 2004 | 1 |
| p02 | 12.2 | 2002 | 2 |
| p02 | 10.3 | 2003 | 3 |
| p02 | 8.7 | 2000 | 4 |
Note that if the rates had ties, for example:
请注意,如果利率有关系,例如:
100, 90, 90, 80, 80, 80, 70, 60, 50, 40, ...
The above query will return 6 rows:
上面的查询将返回 6 行:
100, 90, 90, 80, 80, 80
Change to HAVING COUNT(DISTINCT l.rate) < 5
to get 8 rows:
更改为HAVING COUNT(DISTINCT l.rate) < 5
获得 8 行:
100, 90, 90, 80, 80, 80, 70, 60
Or change to ON t.id = l.id AND (t.rate < l.rate OR (t.rate = l.rate AND t.pri_key > l.pri_key))
to get 5 rows:
或更改为ON t.id = l.id AND (t.rate < l.rate OR (t.rate = l.rate AND t.pri_key > l.pri_key))
获得 5 行:
100, 90, 90, 80, 80
In MySQL 8 or later just use the RANK
, DENSE_RANK
or ROW_NUMBER
functions:
在 MySQL 8 或更高版本中,只需使用RANK
,DENSE_RANK
或ROW_NUMBER
函数:
SELECT *
FROM (
SELECT *, RANK() OVER (PARTITION BY id ORDER BY rate DESC) AS rnk
FROM t
) AS x
WHERE rnk <= 5
回答by Vishal Kumar
For me something like
对我来说像
SUBSTRING_INDEX(group_concat(col_name order by desired_col_order_name), ',', N)
works perfectly. No complicated query.
完美地工作。没有复杂的查询。
for example: get top 1 for each group
例如:获得每组前 1 名
SELECT
*
FROM
yourtable
WHERE
id IN (SELECT
SUBSTRING_INDEX(GROUP_CONCAT(id
ORDER BY rate DESC),
',',
1) id
FROM
yourtable
GROUP BY year)
ORDER BY rate DESC;
回答by bobince
No, you can't LIMIT subqueries arbitrarily (you can do it to a limited extent in newer MySQLs, but not for 5 results per group).
不,您不能随意限制子查询(您可以在较新的 MySQL 中进行有限的限制,但不能限制每组 5 个结果)。
This is a groupwise-maximum type query, which is not trivial to do in SQL. There are various waysto tackle that which can be more efficient for some cases, but for top-n in general you'll want to look at Bill's answerto a similar previous question.
这是一个 groupwise-maximum 类型的查询,在 SQL 中做起来并不容易。有多种方法可以解决在某些情况下可能更有效的问题,但是对于前 n 个一般而言,您需要查看Bill 对之前类似问题的回答。
As with most solutions to this problem, it can return more than five rows if there are multiple rows with the same rate
value, so you may still need a quantity of post-processing to check for that.
与此问题的大多数解决方案一样,如果有多行具有相同的rate
值,它可以返回多于五行,因此您可能仍然需要大量的后处理来检查它。
回答by Brian L Cartwright
This requires a series of subqueries to rank the values, limit them, then perform the sum while grouping
这需要一系列子查询来对值进行排序、限制它们,然后在分组时执行求和
@Rnk:=0;
@N:=2;
select
c.id,
sum(c.val)
from (
select
b.id,
b.bal
from (
select
if(@last_id=id,@Rnk+1,1) as Rnk,
a.id,
a.val,
@last_id=id,
from (
select
id,
val
from list
order by id,val desc) as a) as b
where b.rnk < @N) as c
group by c.id;
回答by Saharsh Shah
Try this:
尝试这个:
SELECT h.year, h.id, h.rate
FROM (SELECT h.year, h.id, h.rate, IF(@lastid = (@lastid:=h.id), @index:=@index+1, @index:=0) indx
FROM (SELECT h.year, h.id, h.rate
FROM h
WHERE h.year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2)
GROUP BY id, h.year
ORDER BY id, rate DESC
) h, (SELECT @lastid:='', @index:=0) AS a
) h
WHERE h.indx <= 5;
回答by Ricky Moreno
SELECT year, id, rate
FROM (SELECT
year, id, rate, row_number() over (partition by id order by rate DESC)
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC) as subquery
WHERE row_number <= 5
The subquery is almost identical to your query. Only change is adding
子查询与您的查询几乎相同。唯一的变化是添加
row_number() over (partition by id order by rate DESC)
回答by Wang Wen'an
Build the virtual columns(like RowID in Oracle)
构建虚拟列(如Oracle中的RowID)
table:
桌子:
`
CREATE TABLE `stack`
(`year` int(11) DEFAULT NULL,
`id` varchar(10) DEFAULT NULL,
`rate` float DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
`
data:
数据:
insert into stack values(2006,'p01',8);
insert into stack values(2001,'p01',5.9);
insert into stack values(2007,'p01',5.3);
insert into stack values(2009,'p01',4.4);
insert into stack values(2001,'p02',12.5);
insert into stack values(2004,'p02',12.4);
insert into stack values(2005,'p01',2.1);
insert into stack values(2000,'p01',0.8);
insert into stack values(2002,'p02',12.2);
insert into stack values(2002,'p01',3.9);
insert into stack values(2004,'p01',3.5);
insert into stack values(2003,'p02',10.3);
insert into stack values(2000,'p02',8.7);
insert into stack values(2006,'p02',4.6);
insert into stack values(2007,'p02',3.3);
insert into stack values(2003,'p01',7.4);
insert into stack values(2008,'p01',6.8);
SQL like this:
像这样的 SQL:
select t3.year,t3.id,t3.rate
from (select t1.*, (select count(*) from stack t2 where t1.rate<=t2.rate and t1.id=t2.id) as rownum from stack t1) t3
where rownum <=3 order by id,rate DESC;
if delete the where clause in t3, it shows like this:
如果删除t3中的where子句,显示如下:
GET "TOP N Record" --> add the "rownum <=3" in where clause(the where-clause of t3);
GET "TOP N Record" --> 在where子句中添加"rownum <=3"(t3的where子句);
CHOOSE "the year" --> add the "BETWEEN 2000 AND 2009" in where clause(the where-clause of t3);
选择“年份”-->在where子句中添加“BETWEEN 2000 AND 2009”(t3的where子句);
回答by John
Took some working, but I thougth my solution would be something to share as it is seems elegant as well as quite fast.
做了一些工作,但我认为我的解决方案是可以分享的,因为它看起来既优雅又快速。
SELECT h.year, h.id, h.rate
FROM (
SELECT id,
SUBSTRING_INDEX(GROUP_CONCAT(CONCAT(id, '-', year) ORDER BY rate DESC), ',' , 5) AS l
FROM h
WHERE year BETWEEN 2000 AND 2009
GROUP BY id
ORDER BY id
) AS h_temp
LEFT JOIN h ON h.id = h_temp.id
AND SUBSTRING_INDEX(h_temp.l, CONCAT(h.id, '-', h.year), 1) != h_temp.l
Note that this example is specified for the purpose of the question and can be modified quite easily for other similar purposes.
请注意,此示例是为问题目的而指定的,并且可以很容易地修改以用于其他类似目的。
回答by Shlomi Noach
The following post: sql: selcting top N record per groupdescribes the complicated way of achieving this without subqueries.
以下帖子:sql:选择每组前 N 条记录描述了在没有子查询的情况下实现这一目标的复杂方法。
It improves on other solutions offered here by:
它改进了此处提供的其他解决方案:
- Doing everything in a single query
- Being able to properly utilize indexes
- Avoiding subqueries, notoriously known to produce bad execution plans in MySQL
- 在单个查询中完成所有操作
- 能够正确利用索引
- 避免子查询,众所周知会在 MySQL 中产生糟糕的执行计划
It is however not pretty. A good solution would be achievable were Window Functions (aka Analytic Functions) enabled in MySQL -- but they are not. The trick used in said post utilizes GROUP_CONCAT, which is sometimes described as "poor man's Window Functions for MySQL".
然而它并不漂亮。如果在 MySQL 中启用了窗口函数(又名分析函数),那么一个好的解决方案是可以实现的——但它们不是。所述帖子中使用的技巧利用了 GROUP_CONCAT,它有时被描述为“用于 MySQL 的穷人窗口函数”。