MySQL MySQL中的排名函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3333665/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 16:40:22  来源:igfitidea点击:

Rank function in MySQL

mysqlsqlrank

提问by Aadi

I need to find out rank of customers. Here I am adding the corresponding ANSI standard SQL query for my requirement. Please help me to convert it to MySQL .

我需要找出客户的等级。在这里,我根据我的要求添加了相应的 ANSI 标准 SQL 查询。请帮我将其转换为 MySQL 。

SELECT RANK() OVER (PARTITION BY Gender ORDER BY Age) AS [Partition by Gender], 
  FirstName, 
  Age,
  Gender 
FROM Person

Is there any function to find out rank in MySQL?

是否有任何功能可以找出 MySQL 中的排名?

回答by Daniel Vassallo

One option is to use a ranking variable, such as the following:

一种选择是使用排名变量,例如:

SELECT    first_name,
          age,
          gender,
          @curRank := @curRank + 1 AS rank
FROM      person p, (SELECT @curRank := 0) r
ORDER BY  age;

The (SELECT @curRank := 0)part allows the variable initialization without requiring a separate SETcommand.

(SELECT @curRank := 0)部件允许在不需要单独SET命令的情况下进行变量初始化。

Test case:

测试用例:

CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));

INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Hyman', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');

Result:

结果:

+------------+------+--------+------+
| first_name | age  | gender | rank |
+------------+------+--------+------+
| Kathy      |   18 | F      |    1 |
| Jane       |   20 | F      |    2 |
| Nick       |   22 | M      |    3 |
| Bob        |   25 | M      |    4 |
| Anne       |   25 | F      |    5 |
| Hyman       |   30 | M      |    6 |
| Bill       |   32 | M      |    7 |
| Steve      |   36 | M      |    8 |
+------------+------+--------+------+
8 rows in set (0.02 sec)

回答by Salman A

Here is a generic solution that assigns dense rank over partition to rows. It uses user variables:

这是一个通用的解决方案,它为行分配分区上的密集等级。它使用用户变量:

CREATE TABLE person (
    id INT NOT NULL PRIMARY KEY,
    firstname VARCHAR(10),
    gender VARCHAR(1),
    age INT
);

INSERT INTO person (id, firstname, gender, age) VALUES
(1,  'Adams',  'M', 33),
(2,  'Matt',   'M', 31),
(3,  'Grace',  'F', 25),
(4,  'Harry',  'M', 20),
(5,  'Scott',  'M', 30),
(6,  'Sarah',  'F', 30),
(7,  'Tony',   'M', 30),
(8,  'Lucy',   'F', 27),
(9,  'Zoe',    'F', 30),
(10, 'Megan',  'F', 26),
(11, 'Emily',  'F', 20),
(12, 'Peter',  'M', 20),
(13, 'John',   'M', 21),
(14, 'Kate',   'F', 35),
(15, 'James',  'M', 32),
(16, 'Cole',   'M', 25),
(17, 'Dennis', 'M', 27),
(18, 'Smith',  'M', 35),
(19, 'Zack',   'M', 35),
(20, 'Jill',   'F', 25);

SELECT person.*, @rank := CASE
    WHEN @partval = gender AND @rankval = age THEN @rank
    WHEN @partval = gender AND (@rankval := age) IS NOT NULL THEN @rank + 1
    WHEN (@partval := gender) IS NOT NULL AND (@rankval := age) IS NOT NULL THEN 1
END AS rnk
FROM person, (SELECT @rank := NULL, @partval := NULL, @rankval := NULL) AS x
ORDER BY gender, age;

Notice that the variable assignments are placed inside the CASEexpression. This (in theory) takes care of order of evaluation issue. The IS NOT NULLis added to handle datatype conversion and short circuiting issues.

请注意,变量赋值放置在CASE表达式中。这(理论上)负责评估问题的顺序。将IS NOT NULL被添加到处理数据类型转换和短路的问题。

PS: It can easily be converted to row number over partition by by removing all conditions that check for tie.

PS:通过删除所有检查平局的条件,它可以很容易地转换为分区上的行号。

| id | firstname | gender | age | rank |
|----|-----------|--------|-----|------|
| 11 | Emily     | F      | 20  | 1    |
| 20 | Jill      | F      | 25  | 2    |
| 3  | Grace     | F      | 25  | 2    |
| 10 | Megan     | F      | 26  | 3    |
| 8  | Lucy      | F      | 27  | 4    |
| 6  | Sarah     | F      | 30  | 5    |
| 9  | Zoe       | F      | 30  | 5    |
| 14 | Kate      | F      | 35  | 6    |
| 4  | Harry     | M      | 20  | 1    |
| 12 | Peter     | M      | 20  | 1    |
| 13 | John      | M      | 21  | 2    |
| 16 | Cole      | M      | 25  | 3    |
| 17 | Dennis    | M      | 27  | 4    |
| 7  | Tony      | M      | 30  | 5    |
| 5  | Scott     | M      | 30  | 5    |
| 2  | Matt      | M      | 31  | 6    |
| 15 | James     | M      | 32  | 7    |
| 1  | Adams     | M      | 33  | 8    |
| 18 | Smith     | M      | 35  | 9    |
| 19 | Zack      | M      | 35  | 9    |

Demo on db<>fiddle

db<>fiddle 上的演示

回答by Rahul Agarwal

While the most upvoted answer ranks, it doesn't partition, You can do a self Join to get the whole thing partitioned also:

虽然最受好评的答案排名,但它不分区,您可以执行 self Join 以将整个内容也分区:

SELECT    a.first_name,
      a.age,
      a.gender,
        count(b.age)+1 as rank
FROM  person a left join person b on a.age>b.age and a.gender=b.gender 
group by  a.first_name,
      a.age,
      a.gender

Use Case

用例

CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));

INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Hyman', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');

Answer:

回答

Bill    32  M   4
Bob     25  M   2
Hyman    30  M   3
Nick    22  M   1
Steve   36  M   5
Anne    25  F   3
Jane    20  F   2
Kathy   18  F   1

回答by Mukesh Soni

A tweak of Daniel's version to calculate percentile along with rank. Also two people with same marks will get the same rank.

Daniel 版本的调整,用于计算百分位数和排名。两个分数相同的人也将获得相同的排名。

set @totalStudents = 0;
select count(*) into @totalStudents from marksheets;
SELECT id, score, @curRank := IF(@prevVal=score, @curRank, @studentNumber) AS rank, 
@percentile := IF(@prevVal=score, @percentile, (@totalStudents - @studentNumber + 1)/(@totalStudents)*100),
@studentNumber := @studentNumber + 1 as studentNumber, 
@prevVal:=score
FROM marksheets, (
SELECT @curRank :=0, @prevVal:=null, @studentNumber:=1, @percentile:=100
) r
ORDER BY score DESC

Results of the query for a sample data -

示例数据的查询结果 -

+----+-------+------+---------------+---------------+-----------------+
| id | score | rank | percentile    | studentNumber | @prevVal:=score |
+----+-------+------+---------------+---------------+-----------------+
| 10 |    98 |    1 | 100.000000000 |             2 |              98 |
|  5 |    95 |    2 |  90.000000000 |             3 |              95 |
|  6 |    91 |    3 |  80.000000000 |             4 |              91 |
|  2 |    91 |    3 |  80.000000000 |             5 |              91 |
|  8 |    90 |    5 |  60.000000000 |             6 |              90 |
|  1 |    90 |    5 |  60.000000000 |             7 |              90 |
|  9 |    84 |    7 |  40.000000000 |             8 |              84 |
|  3 |    83 |    8 |  30.000000000 |             9 |              83 |
|  4 |    72 |    9 |  20.000000000 |            10 |              72 |
|  7 |    60 |   10 |  10.000000000 |            11 |              60 |
+----+-------+------+---------------+---------------+-----------------+

回答by erandac

Combination of Daniel's and Salman's answer. However the rank will not give as continues sequence with ties exists . Instead it skips the rank to next. So maximum always reach row count.

Daniel 和 Salman 的回答的结合。然而,排名不会给出,因为存在有关系的连续序列。相反,它会跳过排名到下一个。所以最大值总是达到行数。

    SELECT    first_name,
              age,
              gender,
              IF(age=@_last_age,@curRank:=@curRank,@curRank:=@_sequence) AS rank,
              @_sequence:=@_sequence+1,@_last_age:=age
    FROM      person p, (SELECT @curRank := 1, @_sequence:=1, @_last_age:=0) r
    ORDER BY  age;

Schema and Test Case:

架构和测试用例:

CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));

INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Hyman', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');
INSERT INTO person VALUES (9, 'Kamal', 25, 'M');
INSERT INTO person VALUES (10, 'Saman', 32, 'M');

Output:

输出:

+------------+------+--------+------+--------------------------+-----------------+
| first_name | age  | gender | rank | @_sequence:=@_sequence+1 | @_last_age:=age |
+------------+------+--------+------+--------------------------+-----------------+
| Kathy      |   18 | F      |    1 |                        2 |              18 |
| Jane       |   20 | F      |    2 |                        3 |              20 |
| Nick       |   22 | M      |    3 |                        4 |              22 |
| Kamal      |   25 | M      |    4 |                        5 |              25 |
| Anne       |   25 | F      |    4 |                        6 |              25 |
| Bob        |   25 | M      |    4 |                        7 |              25 |
| Hyman       |   30 | M      |    7 |                        8 |              30 |
| Bill       |   32 | M      |    8 |                        9 |              32 |
| Saman      |   32 | M      |    8 |                       10 |              32 |
| Steve      |   36 | M      |   10 |                       11 |              36 |
+------------+------+--------+------+--------------------------+-----------------+

回答by Lukas Eder

Starting with MySQL 8, you can finally use window functions also in MySQL: https://dev.mysql.com/doc/refman/8.0/en/window-functions.html

从 MySQL 8 开始,您最终也可以在 MySQL 中使用窗口函数:https: //dev.mysql.com/doc/refman/8.0/en/window-functions.html

Your query can be written exactly the same way:

您的查询可以完全相同的方式编写:

SELECT RANK() OVER (PARTITION BY Gender ORDER BY Age) AS `Partition by Gender`, 
  FirstName, 
  Age,
  Gender 
FROM Person

回答by David Husnian

@Sam, your point is excellent in concept but I think you misunderstood what the MySQL docs are saying on the referenced page -- or I misunderstand :-) -- and I just wanted to add this so that if someone feels uncomfortable with the @Daniel's answer they'll be more reassured or at least dig a little deeper.

@Sam,你的观点在概念上非常好,但我认为你误解了 MySQL 文档在引用页面上所说的内容——或者我误解了 :-)——我只是想添加这个,以便如果有人对 @ 感到不舒服丹尼尔的回答他们会更放心,或者至少会更深入地挖掘。

You see the "@curRank := @curRank + 1 AS rank"inside the SELECTis not "one statement", it's one "atomic" part of the statement so it should be safe.

你看到"@curRank := @curRank + 1 AS rank"里面SELECT不是“一个语句”,它是语句的一个“原子”部分,所以它应该是安全的。

The document you reference goes on to show examples where the same user-defined variable in 2 (atomic) parts of the statement, for example, "SELECT @curRank, @curRank := @curRank + 1 AS rank".

您引用的文档继续显示示例,其中在语句的 2 个(原子)部分中使用相同的用户定义变量,例如"SELECT @curRank, @curRank := @curRank + 1 AS rank".

One might argue that @curRankis used twice in @Daniel's answer: (1) the "@curRank := @curRank + 1 AS rank"and (2) the "(SELECT @curRank := 0) r"but since the second usage is part of the FROMclause, I'm pretty sure it is guaranteed to be evaluated first; essentially making it a second, and preceding, statement.

有人可能会争辩说,@curRank在@Daniel 的回答中使用了两次:(1) the"@curRank := @curRank + 1 AS rank"和 (2) the"(SELECT @curRank := 0) r"但由于第二次使用是该FROM子句的一部分,我很确定它肯定会首先被评估;基本上使它成为第二个和之前的陈述。

In fact, on that same MySQL docs page you referenced, you'll see the same solution in the comments -- it could be where @Daniel got it from; yeah, I know that it's the comments but it is comments on the official docs page and that does carry some weight.

事实上,在您引用的同一个 MySQL 文档页面上,您会在评论中看到相同的解决方案——它可能是@Daniel 的来源;是的,我知道这是评论,但它是官方文档页面上的评论,确实有一定的分量。

回答by Salman A

The most straight forward solution to determine the rank of a given value is to count the number of values beforeit. Suppose we have the following values:

确定给定值的等级的最直接的解决方案是计算它之前的值的数量。假设我们有以下值:

10 20 30 30 30 40
  • All 30values are considered 3rd
  • All 40values are considered 6th(rank) or 4th(dense rank)
  • 所有30值都被视为第三
  • 所有40值都被视为第 6(排名)或第 4(密集排名)


Now back to the original question. Here is some sample data which is sorted as described in OP (expected ranks are added on the right):

现在回到最初的问题。以下是一些示例数据,按照 OP 中的描述进行排序(在右侧添加了预期的排名):

+------+-----------+------+--------+    +------+------------+
| id   | firstname | age  | gender |    | rank | dense_rank |
+------+-----------+------+--------+    +------+------------+
|   11 | Emily     |   20 | F      |    |    1 |          1 |
|    3 | Grace     |   25 | F      |    |    2 |          2 |
|   20 | Jill      |   25 | F      |    |    2 |          2 |
|   10 | Megan     |   26 | F      |    |    4 |          3 |
|    8 | Lucy      |   27 | F      |    |    5 |          4 |
|    6 | Sarah     |   30 | F      |    |    6 |          5 |
|    9 | Zoe       |   30 | F      |    |    6 |          5 |
|   14 | Kate      |   35 | F      |    |    8 |          6 |
|    4 | Harry     |   20 | M      |    |    1 |          1 |
|   12 | Peter     |   20 | M      |    |    1 |          1 |
|   13 | John      |   21 | M      |    |    3 |          2 |
|   16 | Cole      |   25 | M      |    |    4 |          3 |
|   17 | Dennis    |   27 | M      |    |    5 |          4 |
|    5 | Scott     |   30 | M      |    |    6 |          5 |
|    7 | Tony      |   30 | M      |    |    6 |          5 |
|    2 | Matt      |   31 | M      |    |    8 |          6 |
|   15 | James     |   32 | M      |    |    9 |          7 |
|    1 | Adams     |   33 | M      |    |   10 |          8 |
|   18 | Smith     |   35 | M      |    |   11 |          9 |
|   19 | Zack      |   35 | M      |    |   11 |          9 |
+------+-----------+------+--------+    +------+------------+

To calculate RANK() OVER (PARTITION BY Gender ORDER BY Age)for Sarah, you can use this query:

要计算RANK() OVER (PARTITION BY Gender ORDER BY Age)莎拉,您可以使用此查询:

SELECT COUNT(id) + 1 AS rank, COUNT(DISTINCT age) + 1 AS dense_rank
FROM testdata
WHERE gender = (SELECT gender FROM testdata WHERE id = 6)
AND age < (SELECT age FROM testdata WHERE id = 6)

+------+------------+
| rank | dense_rank |
+------+------------+
|    6 |          5 |
+------+------------+

To calculate RANK() OVER (PARTITION BY Gender ORDER BY Age)for Allrows you can use this query:

要计算RANK() OVER (PARTITION BY Gender ORDER BY Age)所有行,你可以使用此查询:

SELECT testdata.id, COUNT(lesser.id) + 1 AS rank, COUNT(DISTINCT lesser.age) + 1 AS dense_rank
FROM testdata
LEFT JOIN testdata AS lesser ON lesser.age < testdata.age AND lesser.gender = testdata.gender
GROUP BY testdata.id

And here is the result (joined values are added on right):

这是结果(连接的值添加在右侧):

+------+------+------------+    +-----------+-----+--------+
| id   | rank | dense_rank |    | firstname | age | gender |
+------+------+------------+    +-----------+-----+--------+
|   11 |    1 |          1 |    | Emily     |  20 | F      |
|    3 |    2 |          2 |    | Grace     |  25 | F      |
|   20 |    2 |          2 |    | Jill      |  25 | F      |
|   10 |    4 |          3 |    | Megan     |  26 | F      |
|    8 |    5 |          4 |    | Lucy      |  27 | F      |
|    6 |    6 |          5 |    | Sarah     |  30 | F      |
|    9 |    6 |          5 |    | Zoe       |  30 | F      |
|   14 |    8 |          6 |    | Kate      |  35 | F      |
|    4 |    1 |          1 |    | Harry     |  20 | M      |
|   12 |    1 |          1 |    | Peter     |  20 | M      |
|   13 |    3 |          2 |    | John      |  21 | M      |
|   16 |    4 |          3 |    | Cole      |  25 | M      |
|   17 |    5 |          4 |    | Dennis    |  27 | M      |
|    5 |    6 |          5 |    | Scott     |  30 | M      |
|    7 |    6 |          5 |    | Tony      |  30 | M      |
|    2 |    8 |          6 |    | Matt      |  31 | M      |
|   15 |    9 |          7 |    | James     |  32 | M      |
|    1 |   10 |          8 |    | Adams     |  33 | M      |
|   18 |   11 |          9 |    | Smith     |  35 | M      |
|   19 |   11 |          9 |    | Zack      |  35 | M      |
+------+------+------------+    +-----------+-----+--------+

回答by Sam Kidman

If you want to rank just one person you can do the following:

如果您只想对一个人进行排名,您可以执行以下操作:

SELECT COUNT(Age) + 1
 FROM PERSON
WHERE(Age < age_to_rank)

This ranking corresponds to the oracle RANK function (Where if you have people with the same age they get the same rank, and the ranking after that is non-consecutive).

这个排名对应于oracle RANK函数(如果你有相同年龄的人他们得到相同的排名,之后的排名是不连续的)。

It's a little bit faster than using one of the above solutions in a subquery and selecting from that to get the ranking of one person.

它比在子查询中使用上述解决方案之一并从中选择以获得一个人的排名要快一点。

This can be used to rank everyone but it's slower than the above solutions.

这可用于对每个人进行排名,但它比上述解决方案慢。

SELECT
  Age AS age_var,
(
  SELECT COUNT(Age) + 1
  FROM Person
  WHERE (Age < age_var)
 ) AS rank
 FROM Person

回答by Max

To avoid the "however" in Erandac's answer in combination of Daniel's and Salman's answers, one may use one of the following "partition workarounds"

为了避免结合 Daniel 和 Salman 的回答,Erandac 的回答中出现“然而”,可以使用以下“分区解决方法”之一

SELECT customerID, myDate

  -- partition ranking works only with CTE / from MySQL 8.0 on
  , RANK() OVER (PARTITION BY customerID ORDER BY dateFrom) AS rank, 

  -- Erandac's method in combination of Daniel's and Salman's
  -- count all items in sequence, maximum reaches row count.
  , IF(customerID=@_lastRank, @_curRank:=@_curRank, @_curRank:=@_sequence+1) AS sequenceRank
  , @_sequence:=@_sequence+1 as sequenceOverAll

  -- Dense partition ranking, works also with MySQL 5.7
  -- remember to set offset values in from clause
  , IF(customerID=@_lastRank, @_nxtRank:=@_nxtRank, @_nxtRank:=@_nxtRank+1 ) AS partitionRank
  , IF(customerID=@_lastRank, @_overPart:=@_overPart+1, @_overPart:=1 ) AS partitionSequence

  , @_lastRank:=customerID
FROM myCustomers, 
  (SELECT @_curRank:=0, @_sequence:=0, @_lastRank:=0, @_nxtRank:=0, @_overPart:=0 ) r
ORDER BY customerID, myDate

The partition ranking in the 3rd variant in this code snippet will return continous ranking numbers. this will lead to a data structur similar to the rank() over partition byresult. As an example, see below. In particular, the partitionSequence will always start with 1 for each new partitionRank, using this method:

此代码片段中第三个变体中的分区排名将返回连续的排名数字。这将导致类似于rank() over partition by结果的数据结构。例如,请参见下文。特别是,对于每个新的 partitionRankpartitionSequence 将始终从 1 开始,使用以下方法:

customerID    myDate   sequenceRank (Erandac)
                          |    sequenceOverAll
                          |     |   partitionRank
                          |     |     | partitionSequence
                          |     |     |    | lastRank
... lines ommitted for clarity
40    09.11.2016 11:19    1     44    1   44    40
40    09.12.2016 12:08    1     45    1   45    40
40    09.12.2016 12:08    1     46    1   46    40
40    09.12.2016 12:11    1     47    1   47    40
40    09.12.2016 12:12    1     48    1   48    40
40    13.10.2017 16:31    1     49    1   49    40
40    15.10.2017 11:00    1     50    1   50    40
76    01.07.2015 00:24    51    51    2    1    76
77    04.08.2014 13:35    52    52    3    1    77
79    15.04.2015 20:25    53    53    4    1    79
79    24.04.2018 11:44    53    54    4    2    79
79    08.10.2018 17:37    53    55    4    3    79
117   09.07.2014 18:21    56    56    5    1   117
119   26.06.2014 13:55    57    57    6    1   119
119   02.03.2015 10:23    57    58    6    2   119
119   12.10.2015 10:16    57    59    6    3   119
119   08.04.2016 09:32    57    60    6    4   119
119   05.10.2016 12:41    57    61    6    5   119
119   05.10.2016 12:42    57    62    6    6   119
...