SQL 为表中的每个组选择前 N 行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3998529/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
selecting top N rows for each group in a table
提问by Harsh
I am facing a very common issue regarding "Selecting top N rows for each group in a table".
我面临一个关于“为表中的每个组选择前 N 行”的非常常见的问题。
Consider a table with id, name, hair_colour, score
columns.
考虑一个带有id, name, hair_colour, score
列的表。
I want a resultset such that, for each hair colour, get me top 3 scorer names.
我想要一个结果集,对于每种头发颜色,让我获得前 3 名得分手的名字。
To solve this i got exactly what i need on Rick Osborne's blogpost "sql-getting-top-n-rows-for-a-grouped-query"
为了解决这个问题,我在Rick Osborne 的博客文章“sql-getting-top-n-rows-for-a-grouped-query”上得到了我所需要的
That solution doesn't work as expected when my scores are equal.
当我的分数相等时,该解决方案无法按预期工作。
In above example the result as follow.
在上面的例子中,结果如下。
id name hair score ranknum
---------------------------------
12 Kit Blonde 10 1
9 Becca Blonde 9 2
8 Katie Blonde 8 3
3 Sarah Brunette 10 1
4 Deborah Brunette 9 2 - ------- - - > if
1 Kim Brunette 8 3
Consider the row 4 Deborah Brunette 9 2
. If this also has same score (10) same as Sarah, then ranknum will be 2,2,3 for "Brunette" type of hair.
考虑行4 Deborah Brunette 9 2
。如果这也有与莎拉相同的分数 (10),那么“黑发”类型的头发的等级将是 2,2,3。
What's the solution to this?
解决这个问题的方法是什么?
回答by marc_s
If you're using SQL Server 2005 or newer, you can use the ranking functions and a CTE to achieve this:
如果您使用的是 SQL Server 2005 或更新版本,您可以使用排名函数和 CTE 来实现这一点:
;WITH HairColors AS
(SELECT id, name, hair, score,
ROW_NUMBER() OVER(PARTITION BY hair ORDER BY score DESC) as 'RowNum'
)
SELECT id, name, hair, score
FROM HairColors
WHERE RowNum <= 3
This CTE will "partition" your data by the value of the hair
column, and each partition is then order by score (descending) and gets a row number; the highest score for each partition is 1, then 2 etc.
此 CTE 将按hair
列的值对您的数据进行“分区” ,然后每个分区按分数(降序)排序并获得行号;每个分区的最高分是 1,然后是 2,以此类推。
So if you want to the TOP 3 of each group, select only those rows from the CTE that have a RowNum
of 3 or less (1, 2, 3) --> there you go!
因此,如果您想获得每组的前 3 名,请仅从 CTE 中选择 aRowNum
为 3 或更少 (1, 2, 3) 的那些行--> 就这样!
回答by Andrzej Doyle
The way the algorithm comes up with the rank, is to count the number of rows in the cross-product with a score equal to or greater than the girl in question, in order to generate rank. Hence in the problem case you're talking about, Sarah's grid would look like
算法提出排名的方法是计算叉积中得分等于或大于相关女孩的行数,以生成排名。因此,在你所说的问题案例中,莎拉的网格看起来像
a.name | a.score | b.name | b.score
-------+---------+---------+--------
Sarah | 9 | Sarah | 9
Sarah | 9 | Deborah | 9
and similarly for Deborah, which is why both girls get a rank of 2 here.
黛博拉也是如此,这就是为什么这两个女孩在这里都获得了 2 的排名。
The problem is that when there's a tie, all girls take the lowestvalue in the tied range due to this count, when you'd want them to take the highest value instead. I think a simple change can fix this:
问题是,当出现平局时,由于此计数,所有女孩都取平局范围内的最低值,而您希望她们取而代之的是最高值。我认为一个简单的改变可以解决这个问题:
Instead of a greater-than-or-equal comparison, use a strict greater-than comparison to count the number of girls who are strictly better. Then, add one to that and you have your rank (which will deal with ties as appropriate). So the inner select would be:
而不是大于或等于比较,使用严格大于比较来计算严格意义上更好的女孩的数量。然后,再加一个,你就有了你的等级(这将酌情处理关系)。所以内部选择将是:
SELECT a.id, COUNT(*) + 1 AS ranknum
FROM girl AS a
INNER JOIN girl AS b ON (a.hair = b.hair) AND (a.score < b.score)
GROUP BY a.id
HAVING COUNT(*) <= 3
Can anyone see any problems with this approach that have escaped my notice?
任何人都可以看到这种方法有什么问题没有引起我的注意吗?
回答by malex
Use this compound select which handles OP problem properly
使用这个复合选择来正确处理 OP 问题
SELECT g.* FROM girls as g
WHERE g.score > IFNULL( (SELECT g2.score FROM girls as g2
WHERE g.hair=g2.hair ORDER BY g2.score DESC LIMIT 3,1), 0)
Note that you need to use IFNULL here to handle case when table girlshas less rows for some type of hairthen we want to see in sql answer (in OP case it is 3 items).
请注意,您需要在此处使用 IFNULL 来处理某些类型头发的表女孩行较少的情况,然后我们希望在 sql 答案中看到(在 OP 情况下,它是 3 个项目)。