如何使用具有重复项的 SQL 从每个组中选择前 3 个值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16720525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to select top 3 values from each group in a table with SQL which have duplicates
提问by PixelsTech
Assume we have a table which has two columns, one column contains the names of some people and the other column contains some values related to each person. One person can have more than one value. Each value has a numeric type. The question is we want to select the top 3 values for each person from the table. If one person has less than 3 values, we select all the values for that person.
假设我们有一个包含两列的表,一列包含一些人的姓名,另一列包含与每个人相关的一些值。一个人可以拥有多个值。每个值都有一个数字类型。问题是我们要从表中为每个人选择前 3 个值。如果一个人的值少于 3 个,我们会选择该人的所有值。
The issue can be solved if there are no duplicates in the table by the query provided in this article Select top 3 values from each group in a table with SQL . But if there are duplicates, what is the solution?
如果通过本文中提供的查询在表中没有重复项,则可以解决该问题使用 SQL 从表中的每个组中选择前 3 个值。但如果有重复,有什么解决办法呢?
For example, if for one name John, he has 5 values related to him. They are 20,7,7,7,4. I need to return the name/value pairs as below order by value descending for each name:
例如,如果对于一个名字 John,他有 5 个与他相关的值。它们是 20、7、7、7、4。我需要按每个名称的值降序返回名称/值对,如下所示:
-----------+-------+
| name | value |
-----------+-------+
| John | 20 |
| John | 7 |
| John | 7 |
-----------+-------+
Only 3 rows should be returned for John even though there are three 7s for John.
即使 John 有三个 7,也应该只为 John 返回 3 行。
回答by ypercube??
In many modern DBMS (e.g. Postgres, Oracle, SQL-Server, DB2 and many others), the following will work just fine. It uses CTEs and ranking function ROW_NUMBER()
which is part of the latest SQL standard:
在许多现代 DBMS(例如 Postgres、Oracle、SQL-Server、DB2 和许多其他的)中,以下内容将正常工作。它使用 CTE 和排名函数ROW_NUMBER()
,这是最新 SQL 标准的一部分:
WITH cte AS
( SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name
ORDER BY value DESC
)
AS rn
FROM t
)
SELECT name, value, rn
FROM cte
WHERE rn <= 3
ORDER BY name, rn ;
Without CTE, only ROW_NUMBER()
:
没有 CTE,只有ROW_NUMBER()
:
SELECT name, value, rn
FROM
( SELECT name, value,
ROW_NUMBER() OVER (PARTITION BY name
ORDER BY value DESC
)
AS rn
FROM t
) tmp
WHERE rn <= 3
ORDER BY name, rn ;
Tested in:
测试于:
In MySQL and other DBMS that do not have ranking functions, one has to use either derived tables, correlated subqueries or self-joins with GROUP BY
.
在 MySQL 和其他没有排名功能的 DBMS 中,必须使用派生表、相关子查询或与GROUP BY
.
The (tid)
is assumed to be the primary key of the table:
将(tid)
被假定为表中的主键:
SELECT t.tid, t.name, t.value, -- self join and GROUP BY
COUNT(*) AS rn
FROM t
JOIN t AS t2
ON t2.name = t.name
AND ( t2.value > t.value
OR t2.value = t.value
AND t2.tid <= t.tid
)
GROUP BY t.tid, t.name, t.value
HAVING COUNT(*) <= 3
ORDER BY name, rn ;
SELECT t.tid, t.name, t.value, rn
FROM
( SELECT t.tid, t.name, t.value,
( SELECT COUNT(*) -- inline, correlated subquery
FROM t AS t2
WHERE t2.name = t.name
AND ( t2.value > t.value
OR t2.value = t.value
AND t2.tid <= t.tid
)
) AS rn
FROM t
) AS t
WHERE rn <= 3
ORDER BY name, rn ;
Tested in MySQL
在MySQL 中测试
回答by Gordon Linoff
I was going to downvote the question. However, I realized that it might really be asking for a cross-database solution.
我打算否决这个问题。但是,我意识到它可能真的需要跨数据库解决方案。
Assuming you are looking for a database independent way to do this, the only way I can think of uses correlated subqueries (or non-equijoins). Here is an example:
假设您正在寻找一种独立于数据库的方法来执行此操作,我能想到的唯一方法是使用相关子查询(或非等值联接)。下面是一个例子:
select distinct t.personid, val, rank
from (select t.*,
(select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
) as rank
from t
) t
where rank in (1, 2, 3)
However, each database that you mention (and I note, Hadoop is not a database) has a better way of doing this. Unfortunately, none of them are standard SQL.
但是,您提到的每个数据库(我注意到,Hadoop 不是数据库)都有更好的方法来执行此操作。不幸的是,它们都不是标准的 SQL。
Here is an example of it working in SQL Server:
这是它在 SQL Server 中工作的示例:
with t as (
select 1 as personid, 5 as val union all
select 1 as personid, 6 as val union all
select 1 as personid, 6 as val union all
select 1 as personid, 7 as val union all
select 1 as personid, 8 as val
)
select distinct t.personid, val, rank
from (select t.*,
(select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
) as rank
from t
) t
where rank in (1, 2, 3);
回答by Deval Shah
Using GROUP_CONCAT
and FIND_IN_SET
you can do that.Check SQLFIDDLE.
使用GROUP_CONCAT
,FIND_IN_SET
你可以做到这一点。检查SQLFIDDLE。
SELECT *
FROM tbl t
WHERE FIND_IN_SET(t.value,(SELECT
SUBSTRING_INDEX(GROUP_CONCAT(t1.value ORDER BY VALUE DESC),',',3)
FROM tbl t1
WHERE t1.name = t.name
GROUP BY t1.name)) > 0
ORDER BY t.name,t.value desc
回答by Deval Shah
If your result set is not so heavy, you can write a stored procedure (or an anonymous PL/SQL-block) for that problem which iterates the result set and finds the bigges three by a simple comparing algorithm.
如果您的结果集不是那么重,您可以为该问题编写一个存储过程(或匿名 PL/SQL 块),它迭代结果集并通过简单的比较算法找到三个大的。
回答by rplusm
Try this -
尝试这个 -
CREATE TABLE #list ([name] [varchar](100) NOT NULL, [value] [int] NOT NULL)
INSERT INTO #list VALUES ('John', 20), ('John', 7), ('John', 7), ('John', 7), ('John', 4);
WITH cte
AS (
SELECT NAME
,value
,ROW_NUMBER() OVER (
PARTITION BY NAME ORDER BY (value) DESC
) RN
FROM #list
)
SELECT NAME
,value
FROM cte
WHERE RN < 4
ORDER BY value DESC
回答by WernerCD
This works for MS SQL. Should be workable in any other SQL dialect that has the ability to assign row numbers in a group by or over clause (or equivelant)
这适用于 MS SQL。应该适用于任何其他能够在 group by 或 over 子句(或等价)中分配行号的 SQL 方言
if object_id('tempdb..#Data') is not null drop table #Data;
GO
create table #data (name varchar(25), value integer);
GO
set nocount on;
insert into #data values ('John', 20);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 5);
insert into #data values ('Hyman', 5);
insert into #data values ('Jane', 30);
insert into #data values ('Jane', 21);
insert into #data values ('John', 5);
insert into #data values ('John', -1);
insert into #data values ('John', -1);
insert into #data values ('Jane', 18);
set nocount off;
GO
with D as (
SELECT
name
,Value
,row_number() over (partition by name order by value desc) rn
From
#Data
)
SELECT Name, Value
FROM D
WHERE RN <= 3
order by Name, Value Desc
Name Value
Hyman 5
Jane 30
Jane 21
Jane 18
John 20
John 7
John 7