oracle 创建索引并使用 dbms_stats 计算后，查询执行速度较慢

Question

提问by rirhs

I have a table with 1.5 million rows. I run a query which fetches records having non repeating values in a column. I am observing a behaviour in which after creating indexes the performance of the query degrades. I also used dbms_stats with 100% estimate percentage(compute mode) to gather statistics so that oracle 11g CBO makes a more informed decision for the query plan, but it doesnt improve the query execution time.

我有一张有 150 万行的表。我运行一个查询，该查询获取列中具有非重复值的记录。我观察到在创建索引后查询性能下降的行为。我还使用了 dbms_stats 和 100% 估计百分比（计算模式）来收集统计信息，以便 oracle 11g CBO 为查询计划做出更明智的决定，但它不会提高查询执行时间。

 SQL> desc tab3;
 Name                 Null?    Type
 ----------------------------------------------
 COL1                          NUMBER(38)
 COL2                          VARCHAR2(100)
 COL3                          VARCHAR2(36)
 COL4                          VARCHAR2(36)
 COL5                          VARCHAR2(4000)
 COL6                          VARCHAR2(4000)
 MEASURE_0                     VARCHAR2(4000)
 MEASURE_1                     VARCHAR2(4000)
 MEASURE_2                     VARCHAR2(4000)
 MEASURE_3                     VARCHAR2(4000)
 MEASURE_4                     VARCHAR2(4000)
 MEASURE_5                     VARCHAR2(4000)
 MEASURE_6                     VARCHAR2(4000)
 MEASURE_7                     VARCHAR2(4000)
 MEASURE_8                     VARCHAR2(4000)
 MEASURE_9                     VARCHAR2(4000)

The column measure_0has 0.4 million unique values.

该列measure_0有 40 万个唯一值。

SQL> select count(*) from (select measure_0 from tab3 group by measure_0 having count(*) = 1) abc;

  COUNT(*)
----------
    403664

The following is the query with the execution plan, please note there are no indexes on the table.

以下是带有执行计划的查询，请注意表上没有索引。

SQL> set autotrace traceonly;

SQL> SELECT * FROM (
  2     SELECT
  3             (ROWNUM -1) AS COL1,
  4             ft.COL1         AS OLD_COL1,
  5             ft.COL2,
  6             ft.COL3,
  7             ft.COL4,
  8             ft.COL5,
  9             ft.COL6,
 10             ft.MEASURE_0,
 11             ft.MEASURE_1,
 12             ft.MEASURE_2,
 13             ft.MEASURE_3,
 14             ft.MEASURE_4,
 15             ft.MEASURE_5,
 16             ft.MEASURE_6,
 17             ft.MEASURE_7,
 18             ft.MEASURE_8,
 19             ft.MEASURE_9
 20     FROM tab3 ft
 21     WHERE MEASURE_0 IN
 22      (
 23             SELECT MEASURE_0
 24             FROM tab3
 25             GROUP BY MEASURE_0
 26             HAVING COUNT(*) = 1
 27      )
 28  ) ABC WHERE COL1 >= 0 AND COL1 <=449;

450 rows selected.

Elapsed: 00:00:01.90

Execution Plan
----------------------------------------------------------
Plan hash value: 3115757351

------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |  1243 |    28M|   717K  (1)| 02:23:29 |
|*  1 |  VIEW                   |          |  1243 |    28M|   717K  (1)| 02:23:29 |
|   2 |   COUNT                 |          |       |       |            |          |
|*  3 |    HASH JOIN            |          |  1243 |    30M|   717K  (1)| 02:23:29 |
|   4 |     VIEW                | VW_NSO_1 |  1686K|  3219M|  6274   (2)| 00:01:16 |
|*  5 |      FILTER             |          |       |       |            |          |
|   6 |       HASH GROUP BY     |          |     1 |  3219M|  6274   (2)| 00:01:16 |
|   7 |        TABLE ACCESS FULL| TAB3     |  1686K|  3219M|  6196   (1)| 00:01:15 |
|   8 |     TABLE ACCESS FULL   | TAB3     |  1686K|    37G|  6211   (1)| 00:01:15 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1">=0 AND "COL1"<=449)
   3 - access("MEASURE_0"="MEASURE_0")
   5 - filter(COUNT(*)=1)

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
        354  recursive calls
          0  db block gets
      46518  consistent gets
      45122  physical reads
          0  redo size
      43972  bytes sent via SQL*Net to client
        715  bytes received via SQL*Net from client
         31  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        450  rows processed

The query takes up 1.90seconds. If I run the query again it takes 1.66seconds. Why does it take more time in the first run?

查询需要1.90秒。如果我再次运行查询，则需要1.66秒。为什么第一次运行需要更多时间？

In order to speed it up I created indexes on the two columns used in the query.

为了加快速度，我在查询中使用的两列上创建了索引。

SQL> create index ind_tab3_orgid on tab3(COL1);

Index created.

Elapsed: 00:00:01.68
SQL> create index ind_tab3_msr_0 on tab3(measure_0);

Index created.

Elapsed: 00:00:01.83

When I fired the query after this for the first time it took a whooping 21seconds to come back. Whereas subsequent runs bought it to 2.9seconds. Why does oracle take so much time in the first run, is it warming up or something.. baffles me!

当我第一次在此之后触发查询时，它花了惊人的21秒才返回。而随后的运行将其购买到2.9秒。为什么oracle在第一次运行时要花这么多时间，是热身还是什么..让我感到困惑！

This is the plan when it takes 2.9 seconds-

这是需要2.9秒时的计划——

450 rows selected.

Elapsed: 00:00:02.92

Execution Plan
----------------------------------------------------------
Plan hash value: 240271480

-------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                |  1243 |    28M|   711K  (1)| 02:22:15 |
|*  1 |  VIEW                          |                |  1243 |    28M|   711K  (1)| 02:22:15 |
|   2 |   COUNT                        |                |       |       |            |          |
|   3 |    NESTED LOOPS                |                |       |       |            |          |
|   4 |     NESTED LOOPS               |                |  1243 |    30M|   711K  (1)| 02:22:15 |
|   5 |      VIEW                      | VW_NSO_1       |  1686K|  3219M|  6274   (2)| 00:01:16 |
|*  6 |       FILTER                   |                |       |       |            |          |
|   7 |        HASH GROUP BY           |                |     1 |  3219M|  6274   (2)| 00:01:16 |
|   8 |         TABLE ACCESS FULL      | TAB3           |  1686K|  3219M|  6196   (1)| 00:01:15 |
|*  9 |      INDEX RANGE SCAN          | IND_TAB3_MSR_0 |  1243 |       |     2   (0)| 00:00:01 |
|  10 |     TABLE ACCESS BY INDEX ROWID| TAB3           |  1243 |    28M|    44   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1">=0 AND "COL1"<=449)
   6 - filter(COUNT(*)=1)
   9 - access("MEASURE_0"="MEASURE_0")

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     660054  consistent gets
      22561  physical reads
          0  redo size
      44358  bytes sent via SQL*Net to client
        715  bytes received via SQL*Net from client
         31  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        450  rows processed

I was expecting the time to be lower than when the table was non indexed. Why does the indexed version of the table takes more time to fetch the results than the non indexed version? If I am not wrong it is the TABLE ACCESS BY INDEX ROWID that is taking up time. Can I enforce oracle to use TABLE ACCESS FULL?

我期望时间比表未编入索引时要短。为什么表的索引版本比非索引版本需要更多时间来获取结果？如果我没记错的话，是 TABLE ACCESS BY INDEX ROWID 占用了时间。我可以强制 oracle 使用 TABLE ACCESS FULL 吗？

I then gathered statistics on the table so that CBO improves the plan with compute option. So now the statistics would be accurate.

然后我收集了表格上的统计数据，以便 CBO 使用计算选项改进计划。所以现在统计是准确的。

SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'EQUBE67DP', tabname=>'TAB3',estimate_percent=>null,cascade=>true);

PL/SQL procedure successfully completed.

Elapsed: 00:01:02.47
SQL> set autotrace off;
SQL> select COLUMN_NAME,NUM_DISTINCT,SAMPLE_SIZE,HISTOGRAM,LAST_ANALYZED from dba_tab_cols where table_name = 'TAB3' ;

COLUMN_NAME                    NUM_DISTINCT SAMPLE_SIZE HISTOGRAM       LAST_ANALYZED
------------------------------ ------------ ----------- --------------- ---------
COL1                                1502257     1502257 NONE            27-JUN-12
COL2                                      0             NONE            27-JUN-12
COL3                                      1     1502257 NONE            27-JUN-12
COL4                                      0             NONE            27-JUN-12
COL5                                1502257     1502257 NONE            27-JUN-12
COL6                                1502257     1502257 NONE            27-JUN-12
MEASURE_0                            405609     1502257 HEIGHT BALANCED 27-JUN-12
MEASURE_1                            128570     1502257 NONE            27-JUN-12
MEASURE_2                           1502257     1502257 NONE            27-JUN-12
MEASURE_3                            185657     1502257 NONE            27-JUN-12
MEASURE_4                               901     1502257 NONE            27-JUN-12
MEASURE_5                                17     1502257 NONE            27-JUN-12
MEASURE_6                              2202     1502257 NONE            27-JUN-12
MEASURE_7                              2193     1502257 NONE            27-JUN-12
MEASURE_8                                21     1502257 NONE            27-JUN-12
MEASURE_9                             27263     1502257 NONE            27-JUN-12

I again ran the query

我再次运行查询

450 rows selected.

Elapsed: 00:00:02.95

Execution Plan
----------------------------------------------------------
Plan hash value: 240271480

-------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                |    31M|   718G|  8046   (2)| 00:01:37 |
|*  1 |  VIEW                          |                |    31M|   718G|  8046   (2)| 00:01:37 |
|   2 |   COUNT                        |                |       |       |            |          |
|   3 |    NESTED LOOPS                |                |       |       |            |          |
|   4 |     NESTED LOOPS               |                |    31M|    62G|  8046   (2)| 00:01:37 |
|   5 |      VIEW                      | VW_NSO_1       |  4057 |  7931K|  6263   (2)| 00:01:16 |
|*  6 |       FILTER                   |                |       |       |            |          |
|   7 |        HASH GROUP BY           |                |     1 | 20285 |  6263   (2)| 00:01:16 |
|   8 |         TABLE ACCESS FULL      | TAB3           |  1502K|  7335K|  6193   (1)| 00:01:15 |
|*  9 |      INDEX RANGE SCAN          | IND_TAB3_MSR_0 |     4 |       |     2   (0)| 00:00:01 |
|  10 |     TABLE ACCESS BY INDEX ROWID| TAB3           |   779K|    75M|     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1">=0 AND "COL1"<=449)
   6 - filter(COUNT(*)=1)
   9 - access("MEASURE_0"="MEASURE_0")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     660054  consistent gets
      22561  physical reads
          0  redo size
      44358  bytes sent via SQL*Net to client
        715  bytes received via SQL*Net from client
         31  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        450  rows processed

This time the query came back in 2.9seconds (sometimes it took 3.9seconds too).

这次查询在2.9秒内返回（有时也需要3.9秒）。

My goal is to minimize the query execution time as much as possible. But after adding indexes or after computing statistics the query time just kept increasing. Why is this happening and how can I improve even by keeping the indexes?

我的目标是尽可能减少查询执行时间。但是在添加索引或计算统计后，查询时间一直在增加。为什么会发生这种情况，即使保留索引，我该如何改进？

Answer 1

回答by Vincent Malgrat

First of all, let me quote Tom Kyte:

首先，让我引用Tom Kyte 的话：

just keep saying to yourself over and over
"full scans are not evil, indexes are not good"
"full scans are not evil, indexes are not good"
"full scans are not evil, indexes are not good"
"full scans are not evil, indexes are not good"
"full scans are not evil, indexes are not good"
"full scans are not evil, indexes are not good"

一遍又一遍地对自己说
“全扫描不邪恶，索引不好”
“全扫描不邪恶，索引不好”
“全扫描不邪恶，索引不好”
“全扫描不邪恶，索引不好”
“完整扫描不邪恶，索引不好”
“全扫描不邪恶，索引不好”

Indexes will not alwaysimprove performance, they are not the magical silver bullet (as if such a thing ever existed :)

索引不会总是提高性能，它们不是神奇的灵丹妙药（好像曾经存在过这样的东西：）

Now you're asking whyit takes longer with your index. The answer is pretty simple:

现在您要问为什么索引需要更长的时间。答案很简单：

with the full table scan: 46518consistent gets
with your index: 660054consistent gets

全表扫描： 46518一致获取
与您的索引：660054一致获取

In other words: Oracle perform more read operations with your index than with the full table scan. This happens because:

换句话说：与全表扫描相比，Oracle 对您的索引执行更多的读取操作。发生这种情况是因为：

FULL TABLE SCAN reads are bulk operations (many blocks at a time) and are therefore an efficient way to read lots of data
sometimes when you read from an index you end up reading the exact same data block more than once.

FULL TABLE SCAN 读取是批量操作（一次多个块），因此是读取大量数据的有效方法
有时，当您从索引中读取数据时，您最终会多次读取完全相同的数据块。

As to why the optimizer chose to use this obviously non-efficient index, this is because even with esimate_percent=100and full histograms (which you have collected on the MEASURE_0column), some data distribution still can not be reliably expressed by the simple analysis of the optimizer. In particular, cross-column and cross-table dependency is not well-understood by the analyzer. This leads to wrong estimates, which lead to poor plan choice.

至于为什么优化器选择使用这个明显效率不高的索引，这是因为即使有esimate_percent=100完整的直方图（你已经在MEASURE_0列上收集了），优化器的简单分析仍然不能可靠地表达一些数据分布。特别是，分析器不能很好地理解跨列和跨表依赖性。这会导致错误的估计，从而导致糟糕的计划选择。

Edit:it seems the working hypothesis of the CBO are not working at all for this self-join (your last query expects 31 million rows whereas only 450 are selected !). This is quite puzzling since the table has only 1.5 M rows. What version of Oracle are you using?

编辑：似乎 CBO 的工作假设对于这种自联接根本不起作用（您的最后一个查询预计有 3100 万行，而只选择了 450 行！）。这很令人费解，因为该表只有 150 万行。您使用的是哪个版本的 Oracle？

I think you will find that you can remove the self-join and therefore improve query performance with analytics:

我认为您会发现您可以删除自联接，从而通过分析提高查询性能：

SELECT * FROM (
   SELECT (ROWNUM -1) AS COL1, ABC.*
     FROM (
      SELECT
              ft.COL1 AS OLD_COL1,
              [...],
              COUNT(*) OVER (PARTITION BY MEASURE_O) nb_0
      FROM tab3 ft
      ) ABC 
    WHERE nb_0 = 1
      AND ROWNUM - 1 <= 449
   ) v
 WHERE COL1 >= 0;

You were also asking why the first time a query is run takes more time in general. This is because there are caches at work. On the database level there is the SGA where all blocks are first copied from disk and then can be read multiple times (the first time a block is queried is always a physical read). Then some systems also have an independent system cache that will return data faster if it has been read recently.

您还问为什么第一次运行查询通常需要更多时间。这是因为有缓存在工作。在数据库级别有 SGA，其中所有块首先从磁盘复制，然后可以多次读取（第一次查询块总是物理读取）。然后一些系统还有一个独立的系统缓存，如果最近读取过数据，它会更快地返回数据。

For further reading:

进一步阅读：

an explication of a data discrepancy that can lead to wrong estimatesand a solution involving SQL profiles.

对可能导致错误估计的数据差异的解释以及涉及 SQL 配置文件的解决方案。

Answer 2

回答by Tebbe

How does this code perform?

这段代码如何执行？

SELECT ROWNUM - 1      AS col1
,      ft.col1         AS old_col1
,      ft.col2
,      ft.col3
,      ft.col4
,      ft.col5
,      ft.col6
,      ft.measure_0
,      ft.measure_1
,      ft.measure_2
,      ft.measure_3
,      ft.measure_4
,      ft.measure_5
,      ft.measure_6
,      ft.measure_7
,      ft.measure_8
,      ft.measure_9
FROM   tab3 ft
WHERE  NOT EXISTS (SELECT NULL
                   FROM   tab3   ft_prime
                   WHERE  ft_prime.measure_0 = ft.measure_0
                   AND    ft_prime.ROWID <> ft.ROWID)
AND    ROWNUM <= 450;

oracle 创建索引并使用 dbms_stats 计算后，查询执行速度较慢

提问by rirhs

回答by Vincent Malgrat

回答by Tebbe

相关推荐

最近更新

标签

oracle 创建索引并使用 dbms_stats 计算后，查询执行速度较慢

提问by rirhs

回答by Vincent Malgrat

回答by Tebbe

相关推荐

如何判断 Java 是 Sun / Oracle / OpenJDK？

oracle convert timestamps to other time masks

将外部选择行变量传递给 oracle 中的内部选择

oracle 如何从java代码中的数据库表中获取Max id

相关推荐

最近更新

标签