oracle:调整选择子句中的相关子查询

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5262579/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 23:00:36  来源:igfitidea点击:

oracle: tune correlated subqueries in select clause

oracleperformanceselectcorrelated-subquery

提问by shawno

On Oracle 10gR2, given the following query, it takes forever to run. This is because of all the correlated subqueries in the select clause. There has got to be a better way. I'm thinking that rewriting the correlated subqueries as joins would do the trick, or building the query in such a way as to only need one correlated subquery, or possibly using some type of analytic function, but so far it is beyond me. Any help would be much appreciated.

在 Oracle 10gR2 上,给定以下查询,运行需要很长时间。这是因为 select 子句中的所有相关子查询。必须有更好的方法。我认为将相关子查询重写为连接可以解决问题,或者以只需要一个相关子查询的方式构建查询,或者可能使用某种类型的分析函数,但到目前为止它超出了我的范围。任何帮助将非常感激。

Here is the query:

这是查询:

SELECT COL_1,
       TAB_1.COL_2                                 AS REPORT,
       (SELECT COL_3
        FROM   TAB_2
        WHERE  TAB_2.COL_1 = TAB_1.COL_1)          AS DEPOT,
       (SELECT COUNT(DISTINCT( TAB_3.COL_4 ))
        FROM   TAB_3
        WHERE  TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_1.COL_2 = TAB_3.COL_6
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Parts Shortage') AS P_SHORTAGES,
       (SELECT COUNT(DISTINCT( Trim(COL_10) ))
        FROM   TAB_3
        WHERE  TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_1.COL_2 = TAB_3.COL_6
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Parts Shortage') AS PARTS_AFFECTED,
       (SELECT COUNT(TAB_3.COL_7)
        FROM   TAB_3
        WHERE  TAB_3.COL_7 = 1
               AND TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_1.COL_2 = TAB_3.COL_6
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Parts Shortage') AS PARTS_CATEGORY1,
       (SELECT COUNT(TAB_3.COL_4)
        FROM   TAB_3
        WHERE  TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_3.COL_6 = TAB_1.COL_2
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Unsrv Asset')    AS U_SHORTAGES,
       (SELECT COUNT(TAB_3.COL_10)
        FROM   TAB_3
        WHERE  TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_3.COL_6 = TAB_1.COL_2
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Unsrv Asset')    AS U_AFFECTED,
       (SELECT COUNT(DISTINCT( Trim(TAB_3.COL_7) ))
        FROM   TAB_3
        WHERE  TAB_3.COL_7 = 1
               AND TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_3.COL_6 = TAB_1.COL_2
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Unsrv Asset')    AS UNSRV_CAT1,
       To_char(TAB_1.COL_11, 'MM/DD/YY')           AS REPORT_DATE
FROM   TAB_1;  

For me, to make things less complicated I tried to rewrite just the following, but I'm still at a loss:

对我来说,为了让事情不那么复杂,我试图重写以下内容,但我仍然不知所措:

    SELECT COL_1,
       (SELECT COUNT(DISTINCT( TAB_3.COL_4 ))
        FROM   TAB_3
        WHERE  TAB_3.COL_9 = TAB_1.COL_1
               AND TAB_1.COL_2 = TAB_3.COL_6
               AND TAB_3.COL_5 IS NULL
               AND TAB_3.COL_8 = 'Parts Shortage') AS P_SHORTAGES
FROM   TAB_1;  

回答by Craig

This is quite a loaded question.. There is not enough info here (e.g. execution plan, record counts, available indexes, pysical location of data in table, etc) to be able to tune the query properly. The best anyone can do is give a guess. That being said.. here is my best guess:

这是一个非常棘手的问题。这里没有足够的信息(例如执行计划、记录计数、可用索引、表中数据的物理位置等)来正确调整查询。任何人都能做的最好的事情就是猜测。话虽这么说..这是我最好的猜测:

select distinct
    tab_1.col_1,
    tab_1.col_2 AS report,
    tab_2.col_3 AS depot,
    COUNT (DISTINCT (case when tab_3.col_8 = 'Parts Shortage' then tab_3.col_4 end)) over (partition by tab_1.col_1, tab_1.col_2) AS p_shortages,
    COUNT (DISTINCT (case when tab_3.col_8 = 'Parts Shortage' then TRIM (tab_3.col_10) end)) over (partition by tab_1.col_1, tab_1.col_2)  AS parts_affected,
    COUNT (case when tab_3.col_8 = 'Parts Shortage'  and tab_3.col_7 = 1 then tab_3.col_7 end) over (partition by tab_1.col_1, tab_1.col_2)  AS parts_category1,
    COUNT (case when tab_3.col_8 = 'Unsrv Asset' then tab_3.col_4 end) over (partition by tab_1.col_1, tab_1.col_2)  AS u_shortages,
    COUNT (case when tab_3.col_8 = 'Unsrv Asset' then tab_3.col_10 end) over (partition by tab_1.col_1, tab_1.col_2)  AS u_affected,
    COUNT (DISTINCT (case when tab_3.col_8 = 'Unsrv Asset' and tab_3.col_7 = 1 then TRIM(tab_3.col_7) end)) over (partition by tab_1.col_1, tab_1.col_2)  AS unsrv_cat1,
    TO_CHAR(tab_1.col_11, 'MM/DD/YY') AS report_date
from tab_1
    left outer join tab_2
        on tab_2.col_1 = tab_1.col_1
    left outer join tab_3
        on     tab_3.col_9 = tab_1.col_1
        AND tab_3.col_6 = tab_1.col_2 
        AND tab_3.col_5 IS NULL;

After looking closer at the query above to explain what I did, I modified it further. This was because I was grouping by TO_CHAR(tab_1.col_11, 'MM/DD/YY'), but saw this wasn't part of the correlated subquery criteria, so it had to change (unless the combination of tab_1.col_1 and col_2 are unique by themselves).

在仔细查看上面的查询以解释我所做的之后,我进一步修改了它。这是因为我按 TO_CHAR(tab_1.col_11, 'MM/DD/YY') 分组,但看到这不是相关子查询条件的一部分,因此必须更改(除非 tab_1.col_1 和 col_2 的组合是独一无二的)。

So now to try to explain this:

所以现在试着解释一下:

Basically what is happening in the original query is that for every row of tab_1, you are running several queries on tab_3. So instead of that, I changed it to an outer join on tab_3. Since I don't know the data, this had to be an outer join because a correlated subquery won't eliminate any rows from the final output, where an inner join might. I just joined to tab_3 once, since all of the subqueries were joining tab_3 back to tab_1 using the same fields. I just moved the specific logic for each subquery into a case statement inside the count so that the case would return null (and thus not get counted) if the criteria wasn't met. This newest version uses aggregate functions to get my calculations at the correct level (tab_1 col_1 and col_2 which is what the original subquery was joining based on). Since converting the subquery to an outer join could have possibly created more rows (if there isn't a 1-to-1 match between the tables) I added the distinct to only get one row for each row in tab_1. All rows for each row in tab_1 should be the same. If there were already duplicates in tab_1, you will have to do something a little more in-depth to get keep the number of records the same.

基本上,原始查询中发生的情况是,对于 tab_1 的每一行,您都在 tab_3 上运行多个查询。因此,我将其更改为 tab_3 上的外部联接。由于我不知道数据,这必须是外连接,因为相关子查询不会从最终输出中消除任何行,而内连接可能会在其中消除。我刚刚加入 tab_3 一次,因为所有子查询都使用相同的字段将 tab_3 加入回 tab_1。我只是将每个子查询的特定逻辑移到计数内的 case 语句中,以便如果不满足条件,则 case 将返回 null(因此不会被计算在内)。这个最新版本使用聚合函数使我的计算处于正确的级别(tab_1 col_1 和 col_2 这是原始子查询加入的基础)。由于将子查询转换为外连接可能会创建更多行(如果表之间没有 1 对 1 匹配),我添加了 distinct 以便只为 tab_1 中的每一行获取一行。tab_1 中每一行的所有行都应该相同。如果 tab_1 中已经存在重复项,您将不得不做一些更深入的事情以保持记录数相同。

Hopefully this makes sense. If you have questions, feel free to ask and I will do my best to explain further.

希望这是有道理的。如果您有任何问题,请随时提出,我会尽力进一步解释。

--------------------------- More explanation

--------------------------- 更多解释

@shawno: Kind of, but I don't really think of it much as a loop.. Using your simplified example, let's pretend your tables look like this:

@shawno:有点,但我并没有真正认为它是一个循环..使用您的简化示例,让我们假设您的表格如下所示:

TAB_1:

col_1           col_2
--------        ---------
A               B
C               D

TAB_3:

col_9           col_6          col_4
--------        ---------      ---------
A               B              X
A               B              Y
A               B              Z
C               D              X
C               D              X

Using the subquery method, you are looking at each row of tab_1 and then running a query against tab_3. So you would do:

使用子查询方法,您查看 tab_1 的每一行,然后针对 tab_3 运行查询。所以你会这样做:

for row col_1 = A, col_2 = B, run a select count(distinct(col_4)) on tab_3 where col_9 = A and col_6 = B. This returns the value 3, which is the value the subquery returns.

对于行 col_1 = A, col_2 = B,在 tab_3 上运行 select count(distinct(col_4)),其中 col_9 = A 和 col_6 = B。这将返回值 3,这是子查询返回的值。

for row col_1 = C, col_2 = D, run a select count(distinct(col_4)) on tab_3 where col_9 = C and col_6 = D. This returns the value 1, which is the value the subquery returns.

对于行 col_1 = C, col_2 = D,在 tab_3 上运行 select count(distinct(col_4)),其中 col_9 = C 和 col_6 = D。这将返回值 1,这是子查询返回的值。

Using the join method, you first join the tables giving you data like:

使用 join 方法,您首先连接表,为您提供如下数据:

col_1           col_2          col_9           col_6          col_4
--------        ---------      --------        ---------      ---------
A               B              A               B              X
A               B              A               B              Y
A               B              A               B              Z
C               D              C               D              X
C               D              C               D              X

So now you just have to make a query based on that data, doing a Count(distinct(col_4)) for each value of col_1, col_2. If you know what your data looks like, you can create a more efficient query, but the idea remains the same.

所以现在您只需要根据该数据进行查询,为 col_1、col_2 的每个值执行 Count(distinct(col_4))。如果您知道自己的数据是什么样的,就可以创建更高效​​的查询,但思路保持不变。

Hope this makes it a little more clear!

希望这能让它更清楚一点!

回答by naveen

The most efficient way to tune corelated subqueries is to have proper index on joining columns. Hence, I would like to suggest you to have proper indexes on the joining columns.

调整相关子查询的最有效方法是在连接列上设置适当的索引。因此,我建议您在连接列上设置适当的索引。

A composite index on Tab3 for col9, col6.

Tab3 上 col9、col6 的复合索引。

A composite index on Tab1 for col1 and col2.

Tab1 上 col1 和 col2 的复合索引。

An index on Tab3.col5 (if it has most nulls, else not create this one)

Tab3.col5 上的索引(如果它有最多的空值,否则不创建这个)

A function based index on tab3.col8 create index <index_name> on tab3(upper(col8))

tab3.col8 上基于函数的索引 create index <index_name> on tab3(upper(col8))

and change the query to use UPPER function say..UPPER(TAB_3.COL_8) = 'PARTS SHORTAGE'...

并更改查询以使用 UPPER 函数说.. UPPER(TAB_3.COL_8) = 'PARTS SHORTAGE'...

Remember to execute this statement before executing the query otherwise function based index is not useful.

请记住在执行查询之前执行此语句,否则基于函数的索引没有用。

alter session set query_rewrite_enabled=true(this makes optimiser to use FBI).

alter session set query_rewrite_enabled=true(这使得优化器可以使用 FBI)。

Then execute the query. Check the explain plan if the query is using all indexes properly... else add /*+Index() */ hints to use the indexes properly by the query.

然后执行查询。如果查询正确使用所有索引,请检查解释计划……否则添加 /*+Index() */ 提示以通过查询正确使用索引。