SQL Oracle“分区依据”关键字

Question

提问by Alex Beardsley

Can someone please explain what the partition bykeyword does and give a simple example of it in action, as well as why one would want to use it? I have a SQL query written by someone else and I'm trying to figure out what it does.

有人可以解释一下partition by关键字的作用并给出一个简单的例子，以及为什么要使用它吗？我有一个由其他人编写的 SQL 查询，我正在尝试弄清楚它的作用。

An example of partition by:

分区示例：

SELECT empno, deptno, COUNT(*) 
OVER (PARTITION BY deptno) DEPT_COUNT
FROM emp

The examples I've seen online seem a bit too in-depth.

我在网上看到的例子似乎有点太深入了。

Answer 1

回答by Guy

The PARTITION BYclause sets the range of records that will be used for each "GROUP" within the OVERclause.

该PARTITION BY子句设置将用于该OVER子句中每个“GROUP”的记录范围。

In your example SQL, DEPT_COUNTwill return the number of employees within that department for every employee record. (It is as if you're de-nomalising the emptable; you still return every record in the emptable.)

在您的示例 SQL 中，DEPT_COUNT将为每个员工记录返回该部门内的员工人数。（就好像您正在对emp表进行去规范化；您仍然返回emp表中的每条记录。）

emp_no  dept_no  DEPT_COUNT
1       10       3
2       10       3
3       10       3 <- three because there are three "dept_no = 10" records
4       20       2
5       20       2 <- two because there are two "dept_no = 20" records

If there was another column (e.g., state) then you could count how many departments in that State.

如果还有另一列（例如，state），那么您可以计算该州有多少个部门。

It is like getting the results of a GROUP BY(SUM, AVG, etc.) without the aggregating the result set (i.e. removing matching records).

这就像在不聚合结果集（即删除匹配记录）的情况下获取GROUP BY( SUM、AVG等) 的结果一样。

It is useful when you use the LAST OVERor MIN OVERfunctions to get, for example, the lowest and highest salary in the department and then use that in a calculation against this records salary withouta sub select, which is much faster.

例如，当您使用LAST OVERorMIN OVER函数获取部门中的最低和最高薪水，然后在没有子选择的情况下将此记录薪水用于计算时，这会很有用，这会快得多。

Read the linked AskTom articlefor further details.

阅读链接的AskTom 文章了解更多详情。

Answer 2

回答by Andrejs

The concept is very well explained by the accepted answer, but I find that the more example one sees, the better it sinks in. Here's an incremental example:

接受的答案很好地解释了这个概念，但我发现一个人看到的例子越多，它就越好。这是一个增量示例：

1) Boss says"get me number of items we have in stock grouped by brand"

1)老板说“给我按品牌分组的库存商品数量”

You say: "no problem"

你说：“没问题”

SELECT 
      BRAND
      ,COUNT(ITEM_ID) 
FROM 
      ITEMS
GROUP BY 
      BRAND;

Result:

结果：

+--------------+---------------+
|  Brand       |   Count       | 
+--------------+---------------+
| H&M          |     50        |
+--------------+---------------+
| Hugo Boss    |     100       |
+--------------+---------------+
| No brand     |     22        |
+--------------+---------------+

2) The boss says"Now get me a list of all items, with their brand AND number of items that the respective brand has"

2）老板说“现在给我一份所有物品的清单，包括它们的品牌和相应品牌的物品数量”

You may try:

你可以试试：

 SELECT 
      ITEM_NR
      ,BRAND
      ,COUNT(ITEM_ID) 
 FROM 
      ITEMS
 GROUP BY 
      BRAND;

But you get:

但是你得到：

ORA-00979: not a GROUP BY expression

This is where the OVER (PARTITION BY BRAND)comes in:

这是OVER (PARTITION BY BRAND)进来的地方：

 SELECT 
      ITEM_NR
      ,BRAND
      ,COUNT(ITEM_ID) OVER (PARTITION BY BRAND) 
 FROM 
      ITEMS;

Whic means:

这意味着：

COUNT(ITEM_ID)- get the number of items
OVER- Over the set of rows
(PARTITION BY BRAND)- that have the same brand

COUNT(ITEM_ID)- 获取物品的数量
OVER- 在行集上
(PARTITION BY BRAND)- 有相同品牌的

And the result is:

结果是：

+--------------+---------------+----------+
|  Items       |  Brand        | Count()  |
+--------------+---------------+----------+
|  Item 1      |  Hugo Boss    |   100    | 
+--------------+---------------+----------+
|  Item 2      |  Hugo Boss    |   100    | 
+--------------+---------------+----------+
|  Item 3      |  No brand     |   22     | 
+--------------+---------------+----------+
|  Item 4      |  No brand     |   22     | 
+--------------+---------------+----------+
|  Item 5      |  H&M          |   50     | 
+--------------+---------------+----------+

etc...

等等...

Answer 3

回答by user60890

It is the SQL extension called analytics. The "over" in the select statement tells oracle that the function is a analytical function, not a group by function. The advantage to using analytics is that you can collect sums, counts, and a lot more with just one pass through of the data instead of looping through the data with sub selects or worse, PL/SQL.

它是称为分析的 SQL 扩展。select 语句中的“over”告诉oracle 该函数是一个解析函数，而不是一个group by 函数。使用分析的优势在于，您只需通过一次数据就可以收集总和、计数等等，而不是使用子选择或更糟糕的 PL/SQL 循环遍历数据。

It does look confusing at first but this will be second nature quickly. No one explains it better then Tom Kyte. So the link above is great.

乍一看确实令人困惑，但这很快就会成为第二天性。没有人能比 Tom Kyte 更好地解释它。所以上面的链接很棒。

Of course, reading the documentationis a must.

当然，阅读文档是必须的。

Answer 4

回答by user60890

EMPNO     DEPTNO DEPT_COUNT

 7839         10          4
 5555         10          4
 7934         10          4
 7782         10          4 --- 4 records in table for dept 10
 7902         20          4
 7566         20          4
 7876         20          4
 7369         20          4 --- 4 records in table for dept 20
 7900         30          6
 7844         30          6
 7654         30          6
 7521         30          6
 7499         30          6
 7698         30          6 --- 6 records in table for dept 30

Here we are getting count for respective deptno. As for deptno 10 we have 4 records in table emp similar results for deptno 20 and 30 also.

在这里，我们正在计算各自的部门。至于 deptno 10，我们在 emp 表中有 4 条记录，deptno 20 和 30 也有类似的结果。

Answer 5

回答by issam

the over partition keyword is as if we are partitioning the data by client_id creation a subset of each client id

over partition 关键字就像我们通过 client_id 创建每个客户端 id 的子集来对数据进行分区

select client_id, operation_date,
       row_number() count(*) over (partition by client_id order by client_id ) as operationctrbyclient
from client_operations e
order by e.client_id;

this query will return the number of operations done by the client_id

此查询将返回由 client_id 完成的操作数

Answer 6

回答by georgejo

I think, this example suggests a small nuance on how the partitioning works and how group by works. My example is from Oracle 12, if my example happens to be a compiling bug.

我认为，这个例子暗示了分区如何工作以及分组如何工作的细微差别。我的示例来自 Oracle 12，如果我的示例恰好是一个编译错误。

I tried :

我试过：

SELECT t.data_key
,      SUM ( CASE when t.state = 'A' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_a_rows
,      SUM ( CASE when t.state = 'B' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_b_rows
,      SUM ( CASE when t.state = 'C' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_c_rows
,      COUNT (1) total_rows
from mytable t
group by t.data_key  ---- This does not compile as the compiler feels that t.state isn't in the group by and doesn't recognize the aggregation I'm looking for

This however works as expected :

然而，这按预期工作：

SELECT distinct t.data_key
,      SUM ( CASE when t.state = 'A' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_a_rows
,      SUM ( CASE when t.state = 'B' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_b_rows
,      SUM ( CASE when t.state = 'C' THEN 1 ELSE 0 END) 
OVER   (PARTITION BY t.data_key) count_c_rows
,      COUNT (1) total_rows
from mytable t;

Producing the number of elements in each state based on the external key "data_key". So, if, data_key = 'APPLE' had 3 rows with state 'A', 2 rows with state 'B', a row with state 'C', the corresponding row for 'APPLE' would be 'APPLE', 3, 2, 1, 6.

根据外部键“data_key”生成每个状态的元素数量。因此，如果 data_key = 'APPLE' 有 3 行状态为 'A'，2 行状态为 'B'，一行状态为 'C'，则 'APPLE' 对应的行将是 'APPLE', 3, 2 , 1, 6。

SQL Oracle“分区依据”关键字

提问by Alex Beardsley

回答by Guy

回答by Andrejs

回答by user60890

回答by user60890

回答by issam

回答by georgejo

相关推荐

最近更新

标签

SQL Oracle“分区依据”关键字

提问by Alex Beardsley

回答by Guy

回答by Andrejs

回答by user60890

回答by user60890

回答by issam

回答by georgejo

相关推荐

如何创建参数化 SQL 查询？我为什么要？

SQL PostgreSQL 表多大才算太大？

SQL 如何将 2 个 select 语句合并为一个？

SQL 如何在表列数据中找到最长的字符串

相关推荐

最近更新

标签