如何使用具有重复项的 SQL 从每个组中选择前 3 个值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16720525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 15:23:42  来源:igfitidea点击:

How to select top 3 values from each group in a table with SQL which have duplicates

sql

提问by PixelsTech

Assume we have a table which has two columns, one column contains the names of some people and the other column contains some values related to each person. One person can have more than one value. Each value has a numeric type. The question is we want to select the top 3 values for each person from the table. If one person has less than 3 values, we select all the values for that person.

假设我们有一个包含两列的表,一列包含一些人的姓名,另一列包含与每个人相关的一些值。一个人可以拥有多个值。每个值都有一个数字类型。问题是我们要从表中为每个人选择前 3 个值。如果一个人的值少于 3 个,我们会选择该人的所有值。

The issue can be solved if there are no duplicates in the table by the query provided in this article Select top 3 values from each group in a table with SQL . But if there are duplicates, what is the solution?

如果通过本文中提供的查询在表中没有重复项,则可以解决该问题使用 SQL 从表中的每个组中选择前 3 个值。但如果有重复,有什么解决办法呢?

For example, if for one name John, he has 5 values related to him. They are 20,7,7,7,4. I need to return the name/value pairs as below order by value descending for each name:

例如,如果对于一个名字 John,他有 5 个与他相关的值。它们是 20、7、7、7、4。我需要按每个名称的值降序返回名称/值对,如下所示:

-----------+-------+
| name     | value |
-----------+-------+
| John     |    20 |
| John     |     7 |
| John     |     7 |
-----------+-------+

Only 3 rows should be returned for John even though there are three 7s for John.

即使 John 有三个 7,也应该只为 John 返回 3 行。

回答by ypercube??

In many modern DBMS (e.g. Postgres, Oracle, SQL-Server, DB2 and many others), the following will work just fine. It uses CTEs and ranking function ROW_NUMBER()which is part of the latest SQL standard:

在许多现代 DBMS(例如 Postgres、Oracle、SQL-Server、DB2 和许多其他的)中,以下内容将正常工作。它使用 CTE 和排名函数ROW_NUMBER(),这是最新 SQL 标准的一部分:

 WITH cte AS
  ( SELECT name, value,
           ROW_NUMBER() OVER (PARTITION BY name
                              ORDER BY value DESC
                             )
             AS rn
    FROM t
  )
SELECT name, value, rn
FROM cte
WHERE rn <= 3
ORDER BY name, rn ;

Without CTE, only ROW_NUMBER():

没有 CTE,只有ROW_NUMBER()

SELECT name, value, rn
FROM 
  ( SELECT name, value,
           ROW_NUMBER() OVER (PARTITION BY name
                              ORDER BY value DESC
                             )
             AS rn
    FROM t
  ) tmp 
WHERE rn <= 3
ORDER BY name, rn ; 

Tested in:

测试于:



In MySQL and other DBMS that do not have ranking functions, one has to use either derived tables, correlated subqueries or self-joins with GROUP BY.

在 MySQL 和其他没有排名功能的 DBMS 中,必须使用派生表、相关子查询或与GROUP BY.

The (tid)is assumed to be the primary key of the table:

(tid)被假定为表中的主键:

SELECT t.tid, t.name, t.value,              -- self join and GROUP BY
       COUNT(*) AS rn
FROM t
  JOIN t AS t2
    ON  t2.name = t.name
    AND ( t2.value > t.value
        OR  t2.value = t.value
        AND t2.tid <= t.tid
        )
GROUP BY t.tid, t.name, t.value
HAVING COUNT(*) <= 3
ORDER BY name, rn ;


SELECT t.tid, t.name, t.value, rn
FROM
  ( SELECT t.tid, t.name, t.value,
           ( SELECT COUNT(*)                -- inline, correlated subquery
             FROM t AS t2
             WHERE t2.name = t.name
              AND ( t2.value > t.value
                 OR  t2.value = t.value
                 AND t2.tid <= t.tid
                  )
           ) AS rn
    FROM t
  ) AS t
WHERE rn <= 3
ORDER BY name, rn ;

Tested in MySQL

MySQL 中测试

回答by Gordon Linoff

I was going to downvote the question. However, I realized that it might really be asking for a cross-database solution.

我打算否决这个问题。但是,我意识到它可能真的需要跨数据库解决方案。

Assuming you are looking for a database independent way to do this, the only way I can think of uses correlated subqueries (or non-equijoins). Here is an example:

假设您正在寻找一种独立于数据库的方法来执行此操作,我能想到的唯一方法是使用相关子查询(或非等值联接)。下面是一个例子:

select distinct t.personid, val, rank
from (select t.*,
             (select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
             ) as rank
      from t
     ) t
where rank in (1, 2, 3)

However, each database that you mention (and I note, Hadoop is not a database) has a better way of doing this. Unfortunately, none of them are standard SQL.

但是,您提到的每个数据库(我注意到,Hadoop 不是数据库)都有更好的方法来执行此操作。不幸的是,它们都不是标准的 SQL。

Here is an example of it working in SQL Server:

这是它在 SQL Server 中工作的示例:

with t as (
      select 1 as personid, 5 as val union all
      select 1 as personid, 6 as val union all
      select 1 as personid, 6 as val union all
      select 1 as personid, 7 as val union all
      select 1 as personid, 8 as val
     )
select distinct t.personid, val, rank
from (select t.*,
             (select COUNT(distinct val) from t t2 where t2.personid = t.personid and t2.val >= t.val
             ) as rank
      from t
     ) t
where rank in (1, 2, 3);

回答by Deval Shah

Using GROUP_CONCATand FIND_IN_SETyou can do that.Check SQLFIDDLE.

使用GROUP_CONCATFIND_IN_SET你可以做到这一点。检查SQLFIDDLE

SELECT *
FROM tbl t
WHERE FIND_IN_SET(t.value,(SELECT
                             SUBSTRING_INDEX(GROUP_CONCAT(t1.value ORDER BY VALUE DESC),',',3)
                           FROM tbl t1
                           WHERE t1.name = t.name
                           GROUP BY t1.name)) > 0
ORDER BY t.name,t.value desc

回答by Deval Shah

If your result set is not so heavy, you can write a stored procedure (or an anonymous PL/SQL-block) for that problem which iterates the result set and finds the bigges three by a simple comparing algorithm.

如果您的结果集不是那么重,您可以为该问题编写一个存储过程(或匿名 PL/SQL 块),它迭代结果集并通过简单的比较算法找到三个大的。

回答by rplusm

Try this -

尝试这个 -

CREATE TABLE #list ([name] [varchar](100) NOT NULL, [value] [int] NOT NULL)
INSERT INTO #list VALUES ('John', 20), ('John', 7), ('John', 7), ('John', 7), ('John', 4);

WITH cte
AS (
SELECT NAME
    ,value
    ,ROW_NUMBER() OVER (
        PARTITION BY NAME ORDER BY (value) DESC
        ) RN
FROM #list
)
SELECT NAME
,value
FROM cte
WHERE RN < 4
ORDER BY value DESC

回答by WernerCD

This works for MS SQL. Should be workable in any other SQL dialect that has the ability to assign row numbers in a group by or over clause (or equivelant)

这适用于 MS SQL。应该适用于任何其他能够在 group by 或 over 子句(或等价)中分配行号的 SQL 方言

if object_id('tempdb..#Data') is not null drop table #Data;
GO

create table #data (name varchar(25), value integer);
GO
set nocount on;
insert into #data values ('John', 20);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 7);
insert into #data values ('John', 5);
insert into #data values ('Hyman', 5);
insert into #data values ('Jane', 30);
insert into #data values ('Jane', 21);
insert into #data values ('John', 5);
insert into #data values ('John', -1);
insert into #data values ('John', -1);
insert into #data values ('Jane', 18);
set nocount off;
GO

with D as (
SELECT
     name
    ,Value
    ,row_number() over (partition by name order by value desc) rn
From
    #Data
)
SELECT Name, Value
FROM D
WHERE RN <= 3
order by Name, Value Desc

Name    Value
Hyman    5
Jane    30
Jane    21
Jane    18
John    20
John    7
John    7