database 为什么oracle表索引了但仍然做全表扫描?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14484485/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 08:51:01  来源:igfitidea点击:

Why oracle table indexed but still do full table scan?

databaseperformanceoracleoracle11gquery-optimization

提问by C.c

I have a table 'MSATTRIBUTE' with 3000K rows. I used the following query to retrieve data, this query has different execution plan with same DB data but in different env. in one env, it appears full scan so the query is very slow, but in another env it all used index scan it's quite good, everybody who knows why it have full table scan in one env because I built index for them, how do I let become index scan just like what I tested in env 1. how I can improve this query?

我有一个包含 3000K 行的表“MSATTRIBUTE”。我使用以下查询来检索数据,此查询具有不同的执行计划,具有相同的数据库数据但在不同的环境中。在一个环境中,它似乎是全扫描,所以查询很慢,但在另一个环境中,它都使用索引扫描,这非常好,每个人都知道为什么在一个环境中进行全表扫描,因为我为他们建立了索引,我该怎么做让成为索引扫描就像我在 env 1 中测试的那样。我如何改进这个查询?

回答by APC

without understanding way more than I care to know about your data model and you business it's hard to give concrete positive advice. But here are some notes about your indexing strategy and why I would guess the optimizer is not using the indxes you have.

如果不了解比我更关心了解您的数据模型和您的业务的方式,则很难给出具体的积极建议。但是这里有一些关于您的索引策略的说明,以及为什么我猜优化器没有使用您拥有的索引。

In the sub-query the access path to REDLINE_MSATTRIBUTE drives from three columns:

在子查询中,从三列到 REDLINE_MSATTRIBUTE 驱动器的访问路径:

  • CLASS
  • OBJECT_ID
  • CHANGE_RELEASE_DATE.
  • 班级
  • 对象 ID
  • CHANGE_RELEASE_DATE。

CLASS is not indexed. but that is presumably not very selective. OBJECT_ID is the leading column of a compound index but the other columns are irrelevant the sub-query.

CLASS 未编入索引。但这大概不是很有选择性。OBJECT_ID 是复合索引的前导列,但其他列与子查询无关。

But the biggest problem is CHANGE_RELEASE_DATE. This is not indexed at all. Which is bad news, as your one primary key look up produces a date which is then compared with CHANGE_RELEASE_DATE. If a column is not indexed teh database has to read the table to get its values.

但最大的问题是 CHANGE_RELEASE_DATE。这根本没有索引。这是坏消息,因为您的一个主键查找会生成一个日期,然后将其与 CHANGE_RELEASE_DATE 进行比较。如果一个列没有被索引,那么数据库必须读取该表来获取它的值。

The main query drives off

主查询驱动器关闭

  • ATTID
  • CHANGE_ID
  • OBJECT_ID (again)
  • CHANGE_RELEASE_DATE (again)
  • CLASS (again)
  • OLD_VALUE
  • ATTID
  • CHANGE_ID
  • OBJECT_ID(再次)
  • CHANGE_RELEASE_DATE(再次)
  • 类(再次)
  • OLD_VALUE

ATTID is indexed but how sleective is that index? The optimizer probably doesn't think it's very selective. ATTID is also in a compound index with CHANGE_ID and OLD_VALUE but none of them are the leading columns, so that's not very useful. And we've discussed CLASS, CHANGE_RELEASE_DATE and OBJECT_ID already.

ATTID 已编入索引,但该索引的选择性如何?优化器可能认为它不是很有选择性。ATTID 也位于具有 CHANGE_ID 和 OLD_VALUE 的复合索引中,但它们都不是前导列,因此这不是很有用。我们已经讨论了 CLASS、CHANGE_RELEASE_DATE 和 OBJECT_ID。

The optimizer will only choose to use an index if it is cheaper (fewer reads) than a table scan. This usually means WHERE clause criteria need to map to the leading(i.e. leftmost) columns of an index. This could be the case with OBJECT_ID and ATTID in the sub-query except that

如果索引比表扫描更便宜(读取更少),优化器只会选择使用索引。这通常意味着 WHERE 子句标准需要映射到索引的前导(即最左侧)列。子查询中的 OBJECT_ID 和 ATTID 可能就是这种情况,除了

  1. The execution plan would have to do an INDEX SKIP SCAN because REDLINE_MSATTRIBUTE_INDEX1 has CHANGE_ID between the two columns
  2. The database has to go to the table anyway to get the CLASS and the CHANGE_RELEASE_DATE.
  1. 执行计划必须执行 INDEX SKIP SCAN,因为 REDLINE_MSATTRIBUTE_INDEX1 在两列之间具有 CHANGE_ID
  2. 无论如何,数据库必须转到表以获取 CLASS 和 CHANGE_RELEASE_DATE。

So, you might get some improvement by building an index on (CHANGE_RELEASE_DATE, CLASS, OBJECT_ID, ATTID). But as I said upfront, without knowing more about your situation these are just ill-informed guesses.

因此,您可能会通过在(CHANGE_RELEASE_DATE, CLASS, OBJECT_ID, ATTID). 但正如我前面所说的,在不了解您的情况的情况下,这些只是不明智的猜测。

回答by David Aldridge

If the rows are in a different order in the two tables then the indexes in the two systems can have different clustering factors, and hence difference estimated costs for index access. Check the table and index statistics, including the clustering factor, to see if there are significant differences.

如果两个表中行的顺序不同,则两个系统中的索引可能具有不同的聚类因子,因此索引访问的估计成本也不同。检查表和索引统计信息,包括聚簇因子,看看是否存在显着差异。

Also, do either of the systems' explain plans mention dynamic sampling?

另外,两个系统的解释计划是否都提到了动态抽样?

回答by haki

when oracle has an index and decides to use/not use it it's might be because 1) you may have different setting for OPTIMIZER_MODE - make sure it's not on RBO. 2) the data is different - in this case oracle might evaluate the query stats differently. 3) the data is the same but the stats are not up to date. in this case - gather stats

当 oracle 有索引并决定使用/不使用它时,可能是因为 1) 您可能对 OPTIMIZER_MODE 有不同的设置 - 确保它不在 RBO 上。2) 数据不同 - 在这种情况下,oracle 可能会以不同的方式评估查询统计信息。3)数据相同,但统计数据不是最新的。在这种情况下 - 收集统计信息

dbms_stats.gather_table_stats('your3000Ktable',cascade=>true);

4) there are allot more reasons why oracle will not use the index on one environment, i'll suggest comparing parameters (such as OPTIMIZER_ INDEX_COST_ADJ etc...)

4) oracle 不在一个环境中使用索引的原因有很多,我建议比较参数(例如 OPTIMIZER_INDEX_COST_ADJ 等...)

回答by haki

One immediate issue is this piece SELECT RELEASE_DATE FROM CHANGE WHERE ID = 136972355 (This piece of code will run for every row coming back and it doesn't need to... a better way of doing this is using a single cartesian table so it only runs once and returns a static value to compare....

一个迫在眉睫的问题是这段 SELECT RELEASE_DATE FROM CHANGE WHERE ID = 136972355(这段代码将针对返回的每一行运行,它不需要......这样做的更好方法是使用单个笛卡尔表,因此它只运行一次并返回一个静态值进行比较....

Example 1:

示例 1:

Select * From Table1, (Select Sysdate As Compare From Dual) Table2 Where Table1.Date > Table2.Compare.

Is always faster than Select * from Table1 Where Date > Sysdate -- Sysdate will get called for each row as it is dynamic function based value. The earlier example will resolve once to a literal and drastically faster. and i believe this is definitely once piece hurting your query and forcing a table scan.

总是比 Select * from Table1 快,其中 Date > Sysdate -- 每行都会调用 Sysdate,因为它是基于动态函数的值。前面的示例将一次性解析为文字,并且速度大大加快。我相信这绝对是一次伤害您的查询并强制进行表扫描的部分。

I also believe this is a more efficient way to execute the query.

我也相信这是执行查询的更有效方法。

Select  
            REDLINE_MSATTRIBUTE.ATTID
           ,REDLINE_MSATTRIBUTE.VALUE
  From 
            REDLINE_MSATTRIBUTE
           ,(
                SELECT  ATTID
                        ,CHANGE_ID
                        ,MIN(CHANGE_RELEASE_DATE) RELEASE_DATE
                 FROM   REDLINE_MSATTRIBUTE
                       ,(SELECT RELEASE_DATE FROM CHANGE WHERE ID = 136972355) T_COMPARE                
                WHERE   CLASS             = 9000
                  And   OBJECT_ID           = 32718015
                  And   CHANGE_RELEASE_DATE > T_COMPARE.RELEASE_DATE
                  And   ATTID IN (1564, 1565)
                GROUP 
                   BY   ATTID,
                        CHANGE_ID
            ) T_DYNAMIC
Where         
        REDLINE_MSATTRIBUTE.ATTID = T_DYNAMIC.ATTID   
    And REDLINE_MSATTRIBUTE.CHANGE_ID = T_DYNAMIC.CHANGE_ID
    And REDLINE_MSATTRIBUTE.RELEASE_DATE = T_DYNAMIC.RELEASE_DATE
    And CLASS     = 9000
    And OBJECT_ID = 32718015
    And OLD_VALUE ='Y'
Order 
   By   REDLINE_MSATTRIBUTE.ATTID,
        REDLINE_MSATTRIBUTE.VALUE;