SQL 为表中的每个组选择前 N 行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3998529/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 08:01:54  来源:igfitidea点击:

selecting top N rows for each group in a table

sqlranking

提问by Harsh

I am facing a very common issue regarding "Selecting top N rows for each group in a table".

我面临一个关于“为表中的每个组选择前 N 行”的非常常见的问题。

Consider a table with id, name, hair_colour, scorecolumns.

考虑一个带有id, name, hair_colour, score列的表。

I want a resultset such that, for each hair colour, get me top 3 scorer names.

我想要一个结果集,对于每种头发颜色,让我获得前 3 名得分手的名字。

To solve this i got exactly what i need on Rick Osborne's blogpost "sql-getting-top-n-rows-for-a-grouped-query"

为了解决这个问题,我在Rick Osborne 的博客文章“sql-getting-top-n-rows-for-a-grouped-query”上得到了我所需要的

That solution doesn't work as expected when my scores are equal.

当我的分数相等时,该解决方案无法按预期工作。

In above example the result as follow.

在上面的例子中,结果如下。

 id  name  hair  score  ranknum
---------------------------------
 12  Kit    Blonde  10  1
  9  Becca  Blonde  9  2
  8  Katie  Blonde  8  3
  3  Sarah  Brunette 10  1    
  4  Deborah Brunette 9  2 - ------- - - > if
  1  Kim  Brunette 8  3

Consider the row 4 Deborah Brunette 9 2. If this also has same score (10) same as Sarah, then ranknum will be 2,2,3 for "Brunette" type of hair.

考虑行4 Deborah Brunette 9 2。如果这也有与莎拉相同的分数 (10),那么“黑发”类型的头发的等级将是 2,2,3。

What's the solution to this?

解决这个问题的方法是什么?

回答by marc_s

If you're using SQL Server 2005 or newer, you can use the ranking functions and a CTE to achieve this:

如果您使用的是 SQL Server 2005 或更新版本,您可以使用排名函数和 CTE 来实现这一点:

;WITH HairColors AS
(SELECT id, name, hair, score, 
        ROW_NUMBER() OVER(PARTITION BY hair ORDER BY score DESC) as 'RowNum'
)
SELECT id, name, hair, score
FROM HairColors
WHERE RowNum <= 3

This CTE will "partition" your data by the value of the haircolumn, and each partition is then order by score (descending) and gets a row number; the highest score for each partition is 1, then 2 etc.

此 CTE 将按hair列的值对您的数据进行“分区” ,然后每个分区按分数(降序)排序并获得行号;每个分区的最高分是 1,然后是 2,以此类推。

So if you want to the TOP 3 of each group, select only those rows from the CTE that have a RowNumof 3 or less (1, 2, 3) --> there you go!

因此,如果您想获得每组的前 3 名,请仅从 CTE 中选择 aRowNum为 3 或更少 (1, 2, 3) 的那些行--> 就这样!

回答by Andrzej Doyle

The way the algorithm comes up with the rank, is to count the number of rows in the cross-product with a score equal to or greater than the girl in question, in order to generate rank. Hence in the problem case you're talking about, Sarah's grid would look like

算法提出排名的方法是计算叉积中得分等于或大于相关女孩的行数,以生成排名。因此,在你所说的问题案例中,莎拉的网格看起来像

a.name | a.score | b.name  | b.score
-------+---------+---------+--------
Sarah  | 9       | Sarah   | 9
Sarah  | 9       | Deborah | 9

and similarly for Deborah, which is why both girls get a rank of 2 here.

黛博拉也是如此,这就是为什么这两个女孩在这里都获得了 2 的排名。

The problem is that when there's a tie, all girls take the lowestvalue in the tied range due to this count, when you'd want them to take the highest value instead. I think a simple change can fix this:

问题是,当出现平局时,由于此计数,所有女孩都取平局范围内的最低值,而您希望她们取而代之的是最高值。我认为一个简单的改变可以解决这个问题:

Instead of a greater-than-or-equal comparison, use a strict greater-than comparison to count the number of girls who are strictly better. Then, add one to that and you have your rank (which will deal with ties as appropriate). So the inner select would be:

而不是大于或等于比较,使用严格大于比较来计算严格意义上更好的女孩的数量。然后,再加一个,你就有了你的等级(这将酌情处理关系)。所以内部选择将是:

SELECT a.id, COUNT(*) + 1 AS ranknum
FROM girl AS a
  INNER JOIN girl AS b ON (a.hair = b.hair) AND (a.score < b.score)
GROUP BY a.id
HAVING COUNT(*) <= 3

Can anyone see any problems with this approach that have escaped my notice?

任何人都可以看到这种方法有什么问题没有引起我的注意吗?

回答by malex

Use this compound select which handles OP problem properly

使用这个复合选择来正确处理 OP 问题

SELECT g.* FROM girls as g
WHERE g.score > IFNULL( (SELECT g2.score FROM girls as g2
                WHERE g.hair=g2.hair ORDER BY g2.score DESC LIMIT 3,1), 0)

Note that you need to use IFNULL here to handle case when table girlshas less rows for some type of hairthen we want to see in sql answer (in OP case it is 3 items).

请注意,您需要在此处使用 IFNULL 来处理某些类型头发的女孩行较少的情况,然后我们希望在 sql 答案中看到(在 OP 情况下,它是 3 个项目)。