MySQL 获取每组分组结果的前 n 条记录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12113699/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 14:37:11  来源:igfitidea点击:

Get top n records for each group of grouped results

mysqlsqlgreatest-n-per-groupmysql-variables

提问by Yarin

The following is the simplest possible example, though any solution should be able to scale to however many n top results are needed:

以下是最简单的示例,但任何解决方案都应该能够扩展到需要多少 n 个顶级结果:

Given a table like that below, with person, group, and age columns, how would you get the 2 oldest people in each group?(Ties within groups should not yield more results, but give the first 2 in alphabetical order)

给定如下表,其中包含人、组和年龄列,您将如何获得每个组中最年长的 2 个人?(组内关系不应产生更多结果,而是按字母顺序给出前 2 个)

+--------+-------+-----+
| Person | Group | Age |
+--------+-------+-----+
| Bob    | 1     | 32  |
| Jill   | 1     | 34  |
| Shawn  | 1     | 42  |
| Jake   | 2     | 29  |
| Paul   | 2     | 36  |
| Laura  | 2     | 39  |
+--------+-------+-----+

Desired result set:

期望的结果集:

+--------+-------+-----+
| Shawn  | 1     | 42  |
| Jill   | 1     | 34  |
| Laura  | 2     | 39  |
| Paul   | 2     | 36  |
+--------+-------+-----+


NOTE:This question builds on a previous one- Get records with max value for each group of grouped SQL results- for getting a single top row from each group, and which received a great MySQL-specific answer from @Bohemian:

注意:这个问题建立在前一个问题的基础上 -为每组分组的 SQL 结果获取具有最大值的记录- 用于从每个组中获取单个顶行,并且从 @Bohemian 收到了一个很好的 MySQL 特定答案:

select * 
from (select * from mytable order by `Group`, Age desc, Person) x
group by `Group`

Would love to be able to build off this, though I don't see how.

希望能够以此为基础,尽管我不知道如何。

采纳答案by Taryn

Here is one way to do this, using UNION ALL(See SQL Fiddle with Demo). This works with two groups, if you have more than two groups, then you would need to specify the groupnumber and add queries for each group:

这是执行此操作的一种方法,使用UNION ALL(See SQL Fiddle with Demo)。这适用于两个组,如果您有两个以上的组,那么您需要指定group数量并为每个添加查询group

(
  select *
  from mytable 
  where `group` = 1
  order by age desc
  LIMIT 2
)
UNION ALL
(
  select *
  from mytable 
  where `group` = 2
  order by age desc
  LIMIT 2
)

There are a variety of ways to do this, see this article to determine the best route for your situation:

有多种方法可以做到这一点,请参阅本文以确定适合您情况的最佳路线:

http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/

http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/

Edit:

编辑:

This might work for you too, it generates a row number for each record. Using an example from the link above this will return only those records with a row number of less than or equal to 2:

这也可能对您有用,它为每条记录生成一个行号。使用上面链接中的示例,这将仅返回行数小于或等于 2 的那些记录:

select person, `group`, age
from 
(
   select person, `group`, age,
      (@num:=if(@group = `group`, @num +1, if(@group := `group`, 1, 1))) row_number 
  from test t
  CROSS JOIN (select @num:=0, @group:=null) c
  order by `Group`, Age desc, person
) as x 
where x.row_number <= 2;

See Demo

演示

回答by Mark Byers

In other databases you can do this using ROW_NUMBER. MySQL doesn't support ROW_NUMBERbut you can use variables to emulate it:

在其他数据库中,您可以使用ROW_NUMBER. MySQL 不支持,ROW_NUMBER但您可以使用变量来模拟它:

SELECT
    person,
    groupname,
    age
FROM
(
    SELECT
        person,
        groupname,
        age,
        @rn := IF(@prev = groupname, @rn + 1, 1) AS rn,
        @prev := groupname
    FROM mytable
    JOIN (SELECT @prev := NULL, @rn := 0) AS vars
    ORDER BY groupname, age DESC, person
) AS T1
WHERE rn <= 2

See it working online: sqlfiddle

查看它在线工作:sqlfiddle



EditI just noticed that bluefeet posted a very similar answer: +1 to him. However this answer has two small advantages:

编辑我刚刚注意到 bluefeet 发布了一个非常相似的答案:+1 给他。然而,这个答案有两个小优点:

  1. It it is a single query. The variables are initialized inside the SELECT statement.
  2. It handles ties as described in the question (alphabetical order by name).
  1. 它是一个单一的查询。变量在 SELECT 语句中初始化。
  2. 它按照问题中的描述处理关系(按名称的字母顺序)。

So I'll leave it here in case it can help someone.

所以我会把它留在这里,以防它可以帮助别人。

回答by snuffn

Try this:

尝试这个:

SELECT a.person, a.group, a.age FROM person AS a WHERE 
(SELECT COUNT(*) FROM person AS b 
WHERE b.group = a.group AND b.age >= a.age) <= 2 
ORDER BY a.group ASC, a.age DESC

DEMO

演示

回答by snuffn

How about using self-joining:

如何使用自连接:

CREATE TABLE mytable (person, groupname, age);
INSERT INTO mytable VALUES('Bob',1,32);
INSERT INTO mytable VALUES('Jill',1,34);
INSERT INTO mytable VALUES('Shawn',1,42);
INSERT INTO mytable VALUES('Jake',2,29);
INSERT INTO mytable VALUES('Paul',2,36);
INSERT INTO mytable VALUES('Laura',2,39);

SELECT a.* FROM mytable AS a
  LEFT JOIN mytable AS a2 
    ON a.groupname = a2.groupname AND a.age <= a2.age
GROUP BY a.person
HAVING COUNT(*) <= 2
ORDER BY a.groupname, a.age DESC;

gives me:

给我:

a.person    a.groupname  a.age     
----------  -----------  ----------
Shawn       1            42        
Jill        1            34        
Laura       2            39        
Paul        2            36      

I was strongly inspired by the answer from Bill Karwin to Select top 10 records for each category

Bill Karwin 的回答给我很大的启发,为每个类别选择前 10 条记录

Also, I'm using SQLite, but this should work on MySQL.

另外,我正在使用 SQLite,但这应该适用于 MySQL。

Another thing: in the above, I replaced the groupcolumn with a groupnamecolumn for convenience.

另一件事:在上面,为了方便起见,我用group列替换了列groupname

Edit:

编辑

Following-up on the OP's comment regarding missing tie results, I incremented on snuffin's answer to show all the ties. This means that if the last ones are ties, more than 2 rows can be returned, as shown below:

跟进 OP 关于缺少领带结果的评​​论,我增加了 snuffin 的答案以显示所有领带。这意味着如果最后一个是平局,则可以返回超过 2 行,如下所示:

.headers on
.mode column

CREATE TABLE foo (person, groupname, age);
INSERT INTO foo VALUES('Paul',2,36);
INSERT INTO foo VALUES('Laura',2,39);
INSERT INTO foo VALUES('Joe',2,36);
INSERT INTO foo VALUES('Bob',1,32);
INSERT INTO foo VALUES('Jill',1,34);
INSERT INTO foo VALUES('Shawn',1,42);
INSERT INTO foo VALUES('Jake',2,29);
INSERT INTO foo VALUES('James',2,15);
INSERT INTO foo VALUES('Fred',1,12);
INSERT INTO foo VALUES('Chuck',3,112);


SELECT a.person, a.groupname, a.age 
FROM foo AS a 
WHERE a.age >= (SELECT MIN(b.age)
                FROM foo AS b 
                WHERE (SELECT COUNT(*)
                       FROM foo AS c
                       WHERE c.groupname = b.groupname AND c.age >= b.age) <= 2
                GROUP BY b.groupname)
ORDER BY a.groupname ASC, a.age DESC;

gives me:

给我:

person      groupname   age       
----------  ----------  ----------
Shawn       1           42        
Jill        1           34        
Laura       2           39        
Paul        2           36        
Joe         2           36        
Chuck       3           112      

回答by Laurent PELE

Snuffin solution seems quite slow to execute when you've got plenty of rows and Mark Byers/Rick James and Bluefeet solutions doesn't work on my environnement (MySQL 5.6) because order by is applied after execution of select, so here is a variant of Marc Byers/Rick James solutions to fix this issue (with an extra imbricated select):

当您有很多行并且 Mark Byers/Rick James 和 Bluefeet 解决方案在我的环境(MySQL 5.6)上不起作用时,Snuffin 解决方案似乎执行起来很慢,因为 order by 是在执行 select 后应用的,所以这里有一个变体Marc Byers/Rick James 解决此问题的解决方案(带有额外的复杂选择):

select person, groupname, age
from
(
    select person, groupname, age,
    (@rn:=if(@prev = groupname, @rn +1, 1)) as rownumb,
    @prev:= groupname 
    from 
    (
        select person, groupname, age
        from persons 
        order by groupname ,  age desc, person
    )   as sortedlist
    JOIN (select @prev:=NULL, @rn :=0) as vars
) as groupedlist 
where rownumb<=2
order by groupname ,  age desc, person;

I tried similar query on a table having 5 millions rows and it returns result in less than 3 seconds

我在一个有 500 万行的表上尝试了类似的查询,它在不到 3 秒的时间内返回结果

回答by Travesty3

Check this out:

看一下这个:

SELECT
  p.Person,
  p.`Group`,
  p.Age
FROM
  people p
  INNER JOIN
  (
    SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`
    UNION
    SELECT MAX(p3.Age) AS Age, p3.`Group` FROM people p3 INNER JOIN (SELECT MAX(Age) AS Age, `Group` FROM people GROUP BY `Group`) p4 ON p3.Age < p4.Age AND p3.`Group` = p4.`Group` GROUP BY `Group`
  ) p2 ON p.Age = p2.Age AND p.`Group` = p2.`Group`
ORDER BY
  `Group`,
  Age DESC,
  Person;

SQL Fiddle: http://sqlfiddle.com/#!2/cdbb6/15

SQL 小提琴:http://sqlfiddle.com/#!2/cdbb6/15

回答by Rick James

If the other answers are not fast enough Give this codea try:

如果其他答案不够快,请尝试以下代码

SELECT
        province, n, city, population
    FROM
      ( SELECT  @prev := '', @n := 0 ) init
    JOIN
      ( SELECT  @n := if(province != @prev, 1, @n + 1) AS n,
                @prev := province,
                province, city, population
            FROM  Canada
            ORDER BY
                province   ASC,
                population DESC
      ) x
    WHERE  n <= 3
    ORDER BY  province, n;

Output:

输出:

+---------------------------+------+------------------+------------+
| province                  | n    | city             | population |
+---------------------------+------+------------------+------------+
| Alberta                   |    1 | Calgary          |     968475 |
| Alberta                   |    2 | Edmonton         |     822319 |
| Alberta                   |    3 | Red Deer         |      73595 |
| British Columbia          |    1 | Vancouver        |    1837970 |
| British Columbia          |    2 | Victoria         |     289625 |
| British Columbia          |    3 | Abbotsford       |     151685 |
| Manitoba                  |    1 | ...

回答by Jon Bown

I wanted to share this because I spent a long time searching for an easy way to implement this in a java program I'm working on. This doesn't quite give the output you're looking for but its close. The function in mysql called GROUP_CONCAT()worked really well for specifying how many results to return in each group. Using LIMITor any of the other fancy ways of trying to do this with COUNTdidn't work for me. So if you're willing to accept a modified output, its a great solution. Lets say I have a table called 'student' with student ids, their gender, and gpa. Lets say I want to top 5 gpas for each gender. Then I can write the query like this

我想分享这个,因为我花了很长时间寻找一种在我正在开发的 Java 程序中实现它的简单方法。这并不能完全给出您正在寻找的输出,但它很接近。mysql 中调用的函数GROUP_CONCAT()非常适合指定每组返回多少结果。使用LIMIT或任何其他花哨的方法来尝试这样做COUNT对我不起作用。因此,如果您愿意接受修改后的输出,这是一个很好的解决方案。假设我有一个名为“学生”的表格,其中包含学生 ID、性别和 gpa。假设我想每个性别都获得前 5 名的 gpa。然后我可以像这样写查询

SELECT sex, SUBSTRING_INDEX(GROUP_CONCAT(cast(gpa AS char ) ORDER BY gpa desc), ',',5) 
AS subcategories FROM student GROUP BY sex;

Note that the parameter '5' tells it how many entries to concatenate into each row

请注意,参数“5”告诉它要连接到每行的条目数

And the output would look something like

输出看起来像

+--------+----------------+
| Male   | 4,4,4,4,3.9    |
| Female | 4,4,3.9,3.9,3.8|
+--------+----------------+

You can also change the ORDER BYvariable and order them a different way. So if I had the student's age I could replace the 'gpa desc' with 'age desc' and it will work! You can also add variables to the group by statement to get more columns in the output. So this is just a way I found that is pretty flexible and works good if you are ok with just listing results.

您还可以更改ORDER BY变量并以不同的方式对其进行排序。因此,如果我有学生的年龄,我可以将“gpa desc”替换为“age desc”,它会起作用!您还可以向 group by 语句添加变量以在输出中获得更多列。所以这只是我发现的一种非常灵活的方法,如果您只需要列出结果就可以很好地工作。

回答by Prakash

In SQL Server row_numer()is a powerful function that can get result easily as below

在 SQL Server 中row_numer()有一个强大的功能,可以很容易地得到如下结果

select Person,[group],age
from
(
select * ,row_number() over(partition by [group] order by age desc) rn
from mytable
) t
where rn <= 2

回答by kovac

There is a really nice answer to this problem at MySQL - How To Get Top N Rows per Each Group

MySQL对这个问题有一个非常好的答案- How To Get Top N Rows per Each Group

Based on the solution in the referenced link, your query would be like:

根据引用链接中的解决方案,您的查询将类似于:

SELECT Person, Group, Age
   FROM
     (SELECT Person, Group, Age, 
                  @group_rank := IF(@group = Group, @group_rank + 1, 1) AS group_rank,
                  @current_group := Group 
       FROM `your_table`
       ORDER BY Group, Age DESC
     ) ranked
   WHERE group_rank <= `n`
   ORDER BY Group, Age DESC;

where nis the top nand your_tableis the name of your table.

这里ntop nyour_table是你的表的名称。

I think the explanation in the reference is really clear. For quick reference I will copy and paste it here:

我认为参考文献中的解释非常清楚。为了快速参考,我将其复制并粘贴到此处:

Currently MySQL does not support ROW_NUMBER() function that can assign a sequence number within a group, but as a workaround we can use MySQL session variables.

These variables do not require declaration, and can be used in a query to do calculations and to store intermediate results.

@current_country := country This code is executed for each row and stores the value of country column to @current_country variable.

@country_rank := IF(@current_country = country, @country_rank + 1, 1) In this code, if @current_country is the same we increment rank, otherwise set it to 1. For the first row @current_country is NULL, so rank is also set to 1.

For correct ranking, we need to have ORDER BY country, population DESC

目前 MySQL 不支持可以在组内分配序列号的 ROW_NUMBER() 函数,但作为一种解决方法,我们可以使用 MySQL 会话变量。

这些变量不需要声明,可以在查询中使用以进行计算和存储中间结果。

@current_country := country 此代码对每一行执行,并将 country 列的值存储到 @current_country 变量中。

@country_rank := IF(@current_country = country, @country_rank + 1, 1) 在这段代码中,如果@current_country 相同,我们增加rank,否则设置为1。对于第一行@current_country 为NULL,所以rank 为也设置为 1。

为了正确排名,我们需要按国家/地区、人口 DESC 排序