MySQL 在 GROUP BY 中使用 LIMIT 以获得每组 N 个结果?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2129693/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 15:04:40  来源:igfitidea点击:

Using LIMIT within GROUP BY to get N results per group?

sqlmysqlgreatest-n-per-groupranking

提问by Wells

The following query:

以下查询:

SELECT
year, id, rate
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC

yields:

产量:

year    id  rate
2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2009    p01 4.4
2002    p01 3.9
2004    p01 3.5
2005    p01 2.1
2000    p01 0.8
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7
2006    p02 4.6
2007    p02 3.3

What I'd like is only the top 5 results for each id:

我想要的只是每个 id 的前 5 个结果:

2006    p01 8
2003    p01 7.4
2008    p01 6.8
2001    p01 5.9
2007    p01 5.3
2001    p02 12.5
2004    p02 12.4
2002    p02 12.2
2003    p02 10.3
2000    p02 8.7

Is there a way to do this using some kind of LIMIT like modifier that works within the GROUP BY?

有没有办法使用某种在 GROUP BY 中工作的类似 LIMIT 的修饰符来做到这一点?

回答by fthiella

You could use GROUP_CONCATaggregated function to get all years into a single column, grouped by idand ordered by rate:

您可以使用GROUP_CONCAT聚合函数将所有年份放入一个列中,按以下分组id和排序rate

SELECT   id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM     yourtable
GROUP BY id

Result:

结果:

-----------------------------------------------------------
|  ID | GROUPED_YEAR                                      |
-----------------------------------------------------------
| p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 |
| p02 | 2001,2004,2002,2003,2000,2006,2007                |
-----------------------------------------------------------

And then you could use FIND_IN_SET, that returns the position of the first argument inside the second one, eg.

然后你可以使用FIND_IN_SET,它返回第二个参数中第一个参数的位置,例如。

SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
1

SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
6

Using a combination of GROUP_CONCATand FIND_IN_SET, and filtering by the position returned by find_in_set, you could then use this query that returns only the first 5 years for every id:

使用GROUP_CONCATand的组合,并按FIND_IN_SETfind_in_set 返回的位置进行过滤,然后您可以使用此查询,该查询仅返回每个 id 的前 5 年:

SELECT
  yourtable.*
FROM
  yourtable INNER JOIN (
    SELECT
      id,
      GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
    FROM
      yourtable
    GROUP BY id) group_max
  ON yourtable.id = group_max.id
     AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5
ORDER BY
  yourtable.id, yourtable.year DESC;

Please see fiddle here.

在此处查看小提琴。

Please note that if more than one row can have the same rate, you should consider using GROUP_CONCAT(DISTINCT rate ORDER BY rate) on the rate column instead of the year column.

请注意,如果不止一行可以具有相同的费率,您应该考虑在费率列而不是年份列上使用 GROUP_CONCAT(DISTINCT rate ORDER BY rate)。

The maximum length of the string returned by GROUP_CONCAT is limited, so this works well if you need to select a few records for every group.

GROUP_CONCAT 返回的字符串的最大长度是有限的,因此如果您需要为每个组选择几条记录,这很有效。

回答by Salman A

The original queryused user variables and ORDER BYon derived tables; the behavior of both quirks is not guaranteed. Revised answer as follows.

所述原始查询使用的用户变量和ORDER BY上派生表; 不能保证这两种怪癖的行为。修改答案如下。

In MySQL 5.x you can use poor man's rank over partition to achieve desired result. Just outer join the table with itself and for each row, count the number of rows lesserthan it. In the above case, lesser row is the one with higher rate:

在 MySQL 5.x 中,您可以使用穷人的分区排名来达到预期的结果。只需将表与自身外部连接,对于每一行,计算比它的行数。在上述情况下,较小的行是具有较高比率的行:

SELECT t.id, t.rate, t.year, COUNT(l.rate) AS rank
FROM t
LEFT JOIN t AS l ON t.id = l.id AND t.rate < l.rate
GROUP BY t.id, t.rate, t.year
HAVING COUNT(l.rate) < 5
ORDER BY t.id, t.rate DESC, t.year

Demo and Result:

演示和结果

| id  | rate | year | rank |
|-----|------|------|------|
| p01 |  8.0 | 2006 | 0    |
| p01 |  7.4 | 2003 | 1    |
| p01 |  6.8 | 2008 | 2    |
| p01 |  5.9 | 2001 | 3    |
| p01 |  5.3 | 2007 | 4    |
| p02 | 12.5 | 2001 | 0    |
| p02 | 12.4 | 2004 | 1    |
| p02 | 12.2 | 2002 | 2    |
| p02 | 10.3 | 2003 | 3    |
| p02 |  8.7 | 2000 | 4    |

Note that if the rates had ties, for example:

请注意,如果利率有关系,例如:

100, 90, 90, 80, 80, 80, 70, 60, 50, 40, ...

The above query will return 6 rows:

上面的查询将返回 6 行:

100, 90, 90, 80, 80, 80

Change to HAVING COUNT(DISTINCT l.rate) < 5to get 8 rows:

更改为HAVING COUNT(DISTINCT l.rate) < 5获得 8 行:

100, 90, 90, 80, 80, 80, 70, 60

Or change to ON t.id = l.id AND (t.rate < l.rate OR (t.rate = l.rate AND t.pri_key > l.pri_key))to get 5 rows:

或更改为ON t.id = l.id AND (t.rate < l.rate OR (t.rate = l.rate AND t.pri_key > l.pri_key))获得 5 行:

 100, 90, 90, 80, 80


In MySQL 8 or later just use the RANK, DENSE_RANKor ROW_NUMBERfunctions:

在 MySQL 8 或更高版本中,只需使用RANK,DENSE_RANKROW_NUMBER函数:

SELECT *
FROM (
    SELECT *, RANK() OVER (PARTITION BY id ORDER BY rate DESC) AS rnk
    FROM t
) AS x
WHERE rnk <= 5

回答by Vishal Kumar

For me something like

对我来说像

SUBSTRING_INDEX(group_concat(col_name order by desired_col_order_name), ',', N) 

works perfectly. No complicated query.

完美地工作。没有复杂的查询。



for example: get top 1 for each group

例如:获得每组前 1 名

SELECT 
    *
FROM
    yourtable
WHERE
    id IN (SELECT 
            SUBSTRING_INDEX(GROUP_CONCAT(id
                            ORDER BY rate DESC),
                        ',',
                        1) id
        FROM
            yourtable
        GROUP BY year)
ORDER BY rate DESC;

回答by bobince

No, you can't LIMIT subqueries arbitrarily (you can do it to a limited extent in newer MySQLs, but not for 5 results per group).

不,您不能随意限制子查询(您可以在较新的 MySQL 中进行有限的限制,但不能限制每组 5 个结果)。

This is a groupwise-maximum type query, which is not trivial to do in SQL. There are various waysto tackle that which can be more efficient for some cases, but for top-n in general you'll want to look at Bill's answerto a similar previous question.

这是一个 groupwise-maximum 类型的查询,在 SQL 中做起来并不容易。有多种方法可以解决在某些情况下可能更有效的问题,但是对于前 n 个一般而言,您需要查看Bill 对之前类似问题的回答。

As with most solutions to this problem, it can return more than five rows if there are multiple rows with the same ratevalue, so you may still need a quantity of post-processing to check for that.

与此问题的大多数解决方案一样,如果有多行具有相同的rate值,它可以返回多于五行,因此您可能仍然需要大量的后处理来检查它。

回答by Brian L Cartwright

This requires a series of subqueries to rank the values, limit them, then perform the sum while grouping

这需要一系列子查询来对值进行排序、限制它们,然后在分组时执行求和

@Rnk:=0;
@N:=2;
select
  c.id,
  sum(c.val)
from (
select
  b.id,
  b.bal
from (
select   
  if(@last_id=id,@Rnk+1,1) as Rnk,
  a.id,
  a.val,
  @last_id=id,
from (   
select 
  id,
  val 
from list
order by id,val desc) as a) as b
where b.rnk < @N) as c
group by c.id;

回答by Saharsh Shah

Try this:

尝试这个:

SELECT h.year, h.id, h.rate 
FROM (SELECT h.year, h.id, h.rate, IF(@lastid = (@lastid:=h.id), @index:=@index+1, @index:=0) indx 
      FROM (SELECT h.year, h.id, h.rate 
            FROM h
            WHERE h.year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2)
            GROUP BY id, h.year
            ORDER BY id, rate DESC
            ) h, (SELECT @lastid:='', @index:=0) AS a
    ) h 
WHERE h.indx <= 5;

回答by Ricky Moreno

SELECT year, id, rate
FROM (SELECT
  year, id, rate, row_number() over (partition by id order by rate DESC)
  FROM h
  WHERE year BETWEEN 2000 AND 2009
  AND id IN (SELECT rid FROM table2)
  GROUP BY id, year
  ORDER BY id, rate DESC) as subquery
WHERE row_number <= 5

The subquery is almost identical to your query. Only change is adding

子查询与您的查询几乎相同。唯一的变化是添加

row_number() over (partition by id order by rate DESC)

回答by Wang Wen'an

Build the virtual columns(like RowID in Oracle)

构建虚拟列(如Oracle中的RowID)

table:

桌子:

`
CREATE TABLE `stack` 
(`year` int(11) DEFAULT NULL,
`id` varchar(10) DEFAULT NULL,
`rate` float DEFAULT NULL) 
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
`

data:

数据:

insert into stack values(2006,'p01',8);
insert into stack values(2001,'p01',5.9);
insert into stack values(2007,'p01',5.3);
insert into stack values(2009,'p01',4.4);
insert into stack values(2001,'p02',12.5);
insert into stack values(2004,'p02',12.4);
insert into stack values(2005,'p01',2.1);
insert into stack values(2000,'p01',0.8);
insert into stack values(2002,'p02',12.2);
insert into stack values(2002,'p01',3.9);
insert into stack values(2004,'p01',3.5);
insert into stack values(2003,'p02',10.3);
insert into stack values(2000,'p02',8.7);
insert into stack values(2006,'p02',4.6);
insert into stack values(2007,'p02',3.3);
insert into stack values(2003,'p01',7.4);
insert into stack values(2008,'p01',6.8);

SQL like this:

像这样的 SQL:

select t3.year,t3.id,t3.rate 
from (select t1.*, (select count(*) from stack t2 where t1.rate<=t2.rate and t1.id=t2.id) as rownum from stack t1) t3 
where rownum <=3 order by id,rate DESC;

if delete the where clause in t3, it shows like this:

如果删除t3中的where子句,显示如下:

enter image description here

在此处输入图片说明

GET "TOP N Record" --> add the "rownum <=3" in where clause(the where-clause of t3);

GET "TOP N Record" --> 在where子句中添加"rownum <=3"(t3的where子句);

CHOOSE "the year" --> add the "BETWEEN 2000 AND 2009" in where clause(the where-clause of t3);

选择“年份”-->在where子句中添加“BETWEEN 2000 AND 2009”(t3的where子句);

回答by John

Took some working, but I thougth my solution would be something to share as it is seems elegant as well as quite fast.

做了一些工作,但我认为我的解决方案是可以分享的,因为它看起来既优雅又快速。

SELECT h.year, h.id, h.rate 
  FROM (
    SELECT id, 
      SUBSTRING_INDEX(GROUP_CONCAT(CONCAT(id, '-', year) ORDER BY rate DESC), ',' , 5) AS l
      FROM h
      WHERE year BETWEEN 2000 AND 2009
      GROUP BY id
      ORDER BY id
  ) AS h_temp
    LEFT JOIN h ON h.id = h_temp.id 
      AND SUBSTRING_INDEX(h_temp.l, CONCAT(h.id, '-', h.year), 1) != h_temp.l

Note that this example is specified for the purpose of the question and can be modified quite easily for other similar purposes.

请注意,此示例是为问题目的而指定的,并且可以很容易地修改以用于其他类似目的。

回答by Shlomi Noach

The following post: sql: selcting top N record per groupdescribes the complicated way of achieving this without subqueries.

以下帖子:sql:选择每组前 N 条记录描述了在没有子查询的情况下实现这一目标的复杂方法。

It improves on other solutions offered here by:

它改进了此处提供的其他解决方案:

  • Doing everything in a single query
  • Being able to properly utilize indexes
  • Avoiding subqueries, notoriously known to produce bad execution plans in MySQL
  • 在单个查询中完成所有操作
  • 能够正确利用索引
  • 避免子查询,众所周知会在 MySQL 中产生糟糕的执行计划

It is however not pretty. A good solution would be achievable were Window Functions (aka Analytic Functions) enabled in MySQL -- but they are not. The trick used in said post utilizes GROUP_CONCAT, which is sometimes described as "poor man's Window Functions for MySQL".

然而它并不漂亮。如果在 MySQL 中启用了窗口函数(又名分析函数),那么一个好的解决方案是可以实现的——但它们不是。所述帖子中使用的技巧利用了 GROUP_CONCAT,它有时被描述为“用于 MySQL 的穷人窗口函数”。