Oracle SQL:如何为每个“组”/“集群”选择 N 条记录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6685899/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 00:09:17  来源:igfitidea点击:

Oracle SQL: How to SELECT N records for each "group" / "cluster"

oracle

提问by Revious

I've got a table big_table, with 4 million record, they are clustered in 40 groups through a column called "process_type_cod". The list of values that this column may assume is in a second table. Let's call it small_table.

我有一个表big_table,有 400 万条记录,它们通过名为“ process_type_cod”的列聚集在 40 个组中。此列可能采用的值列表在第二个表中。我们称之为small_table

So, we have big_table with a NOT NULL FK called process_type_cod that points to small_table (assume the colum name is the same on both tables).

因此,我们有 big_table 和一个名为 process_type_cod 的 NOT NULL FK,它指向 small_table(假设两个表上的列名称相同)。

I want N record (i.e. 10) from big_table, for each recordof the small_table.

对于small_table的每条记录,我想要来自 big_table 的 N 条记录(即 10 个)。

I.e. 10 record from big_table related to the first record of small_table UNION 10 different record from big_table related to the second record of small table, and so on.

即来自 big_table 的 10 条记录与 small_table 的第一条记录相关联,来自 big_table 的 10 条不同记录与小表的第二条记录相关,依此类推。

Is it possible to obtain with a single SQL function?

是否可以使用单个 SQL 函数获取?

回答by Jordan Parmer

I recommend an analytical function such as rank() or row_number(). You could do this with hard-coded unions, but the analytical function does all the hard work for you.

我推荐一个分析函数,例如 rank() 或 row_number()。您可以使用硬编码的联合来做到这一点,但分析功能会为您完成所有艰苦的工作。

select *
from 
(
    select
      bt.col_a,
      bt.col_b,
      bt.process_type_cod,
      row_number() over ( partition by process_type_cod order by col_a nulls last ) rank
    from small_table st
    inner join big_table bt
      on st.process_type_cod = bt.process_type_cod
)
where rank < 11
;

You may not even need that join since big_table has all of the types you care about. In that case, just change the 'from clause' to use big_table and drop the join.

您甚至可能不需要那个连接,因为 big_table 拥有您关心的所有类型。在这种情况下,只需将“from 子句”更改为使用 big_table 并删除连接。

What this does is performs the query and then sorts the records using the 'order by' operator in the partition statement. For a given group (here we grouped by col_a), a numerical row number (i.e. 1, 2, 3, 4, 5, n+1...) is applied to each record consecutively. In the outer where clause, just filter by the records with a number lower than N.

它的作用是执行查询,然后使用分区语句中的“order by”运算符对记录进行排序。对于给定的组(这里我们通过 col_a 进行分组),一个数字行号(即 1, 2, 3, 4, 5, n+1...)被连续应用于每条记录。在外部 where 子句中,只需按数字小于 N 的记录进行过滤。