SQL RANK() 与 ROW_NUMBER()
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7747327/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL RANK() versus ROW_NUMBER()
提问by dotNET Hobbiest
I'm confused about the differences between these. Running the following SQL gets me two idential result sets. Can someone please explain the differences?
我对这些之间的差异感到困惑。运行以下 SQL 会得到两个相同的结果集。有人可以解释这些差异吗?
SELECT ID, [Description], RANK() OVER(PARTITION BY StyleID ORDER BY ID) as 'Rank' FROM SubStyle
SELECT ID, [Description], ROW_NUMBER() OVER(PARTITION BY StyleID ORDER BY ID) as 'RowNumber' FROM SubStyle
回答by Martin Smith
You will only see the difference if you have ties within a partition for a particular ordering value.
如果您在特定排序值的分区内有联系,您将只会看到差异。
RANK
and DENSE_RANK
are deterministic in this case, all rows with the same value for both the ordering and partitioning columns will end up with an equal result, whereas ROW_NUMBER
will arbitrarily (non deterministically) assign an incrementing result to the tied rows.
RANK
并且 DENSE_RANK
在这种情况下是确定性的,排序和分区列具有相同值的所有行最终将得到相同的结果,而ROW_NUMBER
将任意(非确定性地)将递增结果分配给绑定行。
Example:(All rows have the same StyleID
so are in the same partition and within that partition the first 3 rows are tied when ordered by ID
)
示例:(所有行都相同,StyleID
因此位于同一分区中,并且在该分区内,前 3 行在按 排序时并列ID
)
WITH T(StyleID, ID)
AS (SELECT 1,1 UNION ALL
SELECT 1,1 UNION ALL
SELECT 1,1 UNION ALL
SELECT 1,2)
SELECT *,
RANK() OVER(PARTITION BY StyleID ORDER BY ID) AS 'RANK',
ROW_NUMBER() OVER(PARTITION BY StyleID ORDER BY ID) AS 'ROW_NUMBER',
DENSE_RANK() OVER(PARTITION BY StyleID ORDER BY ID) AS 'DENSE_RANK'
FROM T
Returns
退货
StyleID ID RANK ROW_NUMBER DENSE_RANK
----------- -------- --------- --------------- ----------
1 1 1 1 1
1 1 1 2 1
1 1 1 3 1
1 2 4 4 2
You can see that for the three identical rows the ROW_NUMBER
increments, the RANK
value remains the same then it leaps to 4
. DENSE_RANK
also assigns the same rank to all three rows but then the next distinct value is assigned a value of 2.
您可以看到,对于三个相同的行,ROW_NUMBER
增量、RANK
值保持不变然后跃升至4
。DENSE_RANK
还为所有三行分配相同的等级,但随后为下一个不同的值分配值 2。
回答by Ritesh Mengji
ROW_NUMBER :Returns a unique number for each row starting with 1. For rows that have duplicate values,numbers are arbitarily assigned.
ROW_NUMBER :为每行返回一个唯一编号,从 1 开始。对于具有重复值的行,将任意分配编号。
Rank :Assigns a unique number for each row starting with 1,except for rows that have duplicate values,in which case the same ranking is assigned and a gap appears in the sequence for each duplicate ranking.
Rank :为每行分配一个唯一编号,从 1 开始,具有重复值的行除外,在这种情况下,分配相同的排名,并且每个重复排名的序列中都会出现间隙。
回答by Lukas Eder
This article covers an interesting relationship between ROW_NUMBER()
and DENSE_RANK()
(the RANK()
function is not treated specifically). When you need a generated ROW_NUMBER()
on a SELECT DISTINCT
statement, the ROW_NUMBER()
will produce distinct values beforethey are removed by the DISTINCT
keyword. E.g. this query
这篇文章介绍了ROW_NUMBER()
和之间的一个有趣的关系DENSE_RANK()
(该RANK()
函数没有特别处理)。当您需要ROW_NUMBER()
在SELECT DISTINCT
语句上生成a 时,ROW_NUMBER()
将在它们被DISTINCT
关键字删除之前生成不同的值。例如这个查询
SELECT DISTINCT
v,
ROW_NUMBER() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number
... might produce this result (DISTINCT
has no effect):
... 可能会产生此结果(DISTINCT
无效):
+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a | 1 |
| a | 2 |
| a | 3 |
| b | 4 |
| c | 5 |
| c | 6 |
| d | 7 |
| e | 8 |
+---+------------+
Whereas this query:
而这个查询:
SELECT DISTINCT
v,
DENSE_RANK() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number
... produces what you probably want in this case:
...在这种情况下产生您可能想要的内容:
+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a | 1 |
| b | 2 |
| c | 3 |
| d | 4 |
| e | 5 |
+---+------------+
Note that the ORDER BY
clause of the DENSE_RANK()
function will need all other columns from the SELECT DISTINCT
clause to work properly.
请注意,该函数的ORDER BY
子句DENSE_RANK()
需要该子句中的所有其他列SELECT DISTINCT
才能正常工作。
The reason for this is that logically, window functions are calculated before DISTINCT
is applied.
这样做的原因是从逻辑上讲,窗口函数是在DISTINCT
应用之前计算的。
All three functions in comparison
三种功能对比
Using PostgreSQL / Sybase / SQL standard syntax (WINDOW
clause):
使用 PostgreSQL/Sybase/SQL 标准语法(WINDOW
子句):
SELECT
v,
ROW_NUMBER() OVER (window) row_number,
RANK() OVER (window) rank,
DENSE_RANK() OVER (window) dense_rank
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v
... you'll get:
... 你会得到:
+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 4 | 4 | 2 |
| c | 5 | 5 | 3 |
| c | 6 | 5 | 3 |
| d | 7 | 7 | 4 |
| e | 8 | 8 | 5 |
+---+------------+------+------------+
回答by NotMe
Quite a bit:
相当一点:
The rank of a row is one plus the number of ranks that come before the row in question.
行的排名是相关行之前的排名数加一。
Row_number is the distinct rank of rows, without any gap in the ranking.
Row_number 是行的不同排名,排名没有任何差距。
回答by DSR
Simple query without partition clause:
没有分区子句的简单查询:
select
sal,
RANK() over(order by sal desc) as Rank,
DENSE_RANK() over(order by sal desc) as DenseRank,
ROW_NUMBER() over(order by sal desc) as RowNumber
from employee
Output:
输出:
--------|-------|-----------|----------
sal |Rank |DenseRank |RowNumber
--------|-------|-----------|----------
5000 |1 |1 |1
3000 |2 |2 |2
3000 |2 |2 |3
2975 |4 |3 |4
2850 |5 |4 |5
--------|-------|-----------|----------
回答by sansalk
Look this example.
看看这个例子。
CREATE TABLE [dbo].#TestTable(
[id] [int] NOT NULL,
[create_date] [date] NOT NULL,
[info1] [varchar](50) NOT NULL,
[info2] [varchar](50) NOT NULL,
)
Insert some data
插入一些数据
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (1, '1/1/09', 'Blue', 'Green')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (1, '1/2/09', 'Red', 'Yellow')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (1, '1/3/09', 'Orange', 'Purple')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (2, '1/1/09', 'Yellow', 'Blue')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (2, '1/5/09', 'Blue', 'Orange')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (3, '1/2/09', 'Green', 'Purple')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (3, '1/8/09', 'Red', 'Blue')
Repeat same Values for 1
对 1 重复相同的值
INSERT INTO dbo.#TestTable (id, create_date, info1, info2) VALUES (1, '1/1/09', 'Blue', 'Green')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2) VALUES (1, '1/1/09', 'Blue', 'Green')
Look All
查看全部
SELECT * FROM #TestTable
Look your results
看看你的结果
SELECT Id,
create_date,
info1,
info2,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY create_date DESC) AS RowId,
RANK() OVER(PARTITION BY Id ORDER BY create_date DESC) AS [RANK]
FROM #TestTable
Need to understand the different
需要了解不同
回答by user2629395
Also, pay attention to ORDER BY in PARTITION (Standard AdventureWorks db is used for example) when using RANK.
另外,在使用 RANK 时,请注意 PARTITION 中的 ORDER BY(例如使用标准 AdventureWorks db)。
SELECT as1.SalesOrderID, as1.SalesOrderDetailID, RANK() OVER (PARTITION BY as1.SalesOrderID ORDER BY as1.SalesOrderID ) ranknoequal , RANK() OVER (PARTITION BY as1.SalesOrderID ORDER BY as1.SalesOrderDetailId ) ranknodiff FROM Sales.SalesOrderDetail as1 WHERE SalesOrderId = 43659 ORDER BY SalesOrderDetailId;
SELECT as1.SalesOrderID, as1.SalesOrderDetailID, RANK() OVER (PARTITION BY as1.SalesOrderID ORDER BY as1.SalesOrderID ) ranknoequal , RANK() OVER (PARTITION BY as1.SalesOrderID ORDER BY as1.SalesOrderDetailId FROM .SalesOrderDetailId FROM ) SalesOrderId = 43659 ORDER BY SalesOrderDetailId;
Gives result:
给出结果:
的SalesOrderID SalesOrderDetailID rank_same_as_partition rank_salesorderdetailid43659 1 1 1
43659 2 1 2
43659 3 1 3
43659 4 1 4
43659 5 1 5
43659 6 1 6
43659 7 1 7
43659 8 1 8
43659 9 1 9
43659 10 110
43659 11 1 11
43659 12 1 12
But if change order by to (use OrderQty :
但是如果将订单更改为(使用 OrderQty :
SELECT as1.SalesOrderID, as1.OrderQty, RANK() OVER (PARTITION BY as1.SalesOrderID ORDER BY as1.SalesOrderID ) ranknoequal , RANK() OVER (PARTITION BY as1.SalesOrderID ORDER BY as1.OrderQty ) rank_orderqty FROM Sales.SalesOrderDetail as1 WHERE SalesOrderId = 43659 ORDER BY OrderQty;
SELECT as1.SalesOrderID, as1.OrderQty, RANK() OVER (PARTITION BY as1.SalesOrderID ORDER BY as1.SalesOrderID ) ranknoequal , RANK() OVER (PARTITION BY as1.SalesOrderID ORDER BY as1.OrderQHER ) rank.1SalesOrderQHER rank.1Salesqty SalesOrderId = 43659 ORDER BY OrderQty;
Gives:
给出:
的SalesOrderID OrderQty rank_salesorderid rank_orderqty43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 2 1 7
43659 2 1 7
43659 3 1 9
43659 3 1 9
43659 4 1 11
43659 6 1 12
Notice how the Rank changes when we use OrderQty (rightmost column second table) in ORDER BY and how it changes when we use SalesOrderDetailID (rightmost column first table) in ORDER BY.
请注意当我们在 ORDER BY 中使用 OrderQty(最右边的第二列表)时排名如何变化,以及当我们在 ORDER BY 中使用 SalesOrderDetailID(最右边的第一列表)时排名如何变化。
回答by SarahLaMont
I haven't done anything with rank, but I discovered this today with row_number().
我没有对排名做过任何事情,但我今天用 row_number() 发现了这一点。
select item, name, sold, row_number() over(partition by item order by sold) as row from table_name
This will result in some repeating row numbers since in my case each name holds all items. Each item will be ordered by how many were sold.
这将导致一些重复的行号,因为在我的情况下,每个名称都包含所有项目。每件商品将按售出数量排序。
+--------+------+-----+----+
|glasses |store1| 30 | 1 |
|glasses |store2| 35 | 2 |
|glasses |store3| 40 | 3 |
|shoes |store2| 10 | 1 |
|shoes |store1| 20 | 2 |
|shoes |store3| 22 | 3 |
+--------+------+-----+----+