oracle 创建索引并使用 dbms_stats 计算后,查询执行速度较慢
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11229258/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Query executes slower after indexes are created and dbms_stats compute is used
提问by rirhs
I have a table with 1.5 million rows. I run a query which fetches records having non repeating values in a column. I am observing a behaviour in which after creating indexes the performance of the query degrades. I also used dbms_stats with 100% estimate percentage(compute mode) to gather statistics so that oracle 11g CBO makes a more informed decision for the query plan, but it doesnt improve the query execution time.
我有一张有 150 万行的表。我运行一个查询,该查询获取列中具有非重复值的记录。我观察到在创建索引后查询性能下降的行为。我还使用了 dbms_stats 和 100% 估计百分比(计算模式)来收集统计信息,以便 oracle 11g CBO 为查询计划做出更明智的决定,但它不会提高查询执行时间。
SQL> desc tab3;
Name Null? Type
----------------------------------------------
COL1 NUMBER(38)
COL2 VARCHAR2(100)
COL3 VARCHAR2(36)
COL4 VARCHAR2(36)
COL5 VARCHAR2(4000)
COL6 VARCHAR2(4000)
MEASURE_0 VARCHAR2(4000)
MEASURE_1 VARCHAR2(4000)
MEASURE_2 VARCHAR2(4000)
MEASURE_3 VARCHAR2(4000)
MEASURE_4 VARCHAR2(4000)
MEASURE_5 VARCHAR2(4000)
MEASURE_6 VARCHAR2(4000)
MEASURE_7 VARCHAR2(4000)
MEASURE_8 VARCHAR2(4000)
MEASURE_9 VARCHAR2(4000)
The column measure_0
has 0.4 million unique values.
该列measure_0
有 40 万个唯一值。
SQL> select count(*) from (select measure_0 from tab3 group by measure_0 having count(*) = 1) abc;
COUNT(*)
----------
403664
The following is the query with the execution plan, please note there are no indexes on the table.
以下是带有执行计划的查询,请注意表上没有索引。
SQL> set autotrace traceonly;
SQL> SELECT * FROM (
2 SELECT
3 (ROWNUM -1) AS COL1,
4 ft.COL1 AS OLD_COL1,
5 ft.COL2,
6 ft.COL3,
7 ft.COL4,
8 ft.COL5,
9 ft.COL6,
10 ft.MEASURE_0,
11 ft.MEASURE_1,
12 ft.MEASURE_2,
13 ft.MEASURE_3,
14 ft.MEASURE_4,
15 ft.MEASURE_5,
16 ft.MEASURE_6,
17 ft.MEASURE_7,
18 ft.MEASURE_8,
19 ft.MEASURE_9
20 FROM tab3 ft
21 WHERE MEASURE_0 IN
22 (
23 SELECT MEASURE_0
24 FROM tab3
25 GROUP BY MEASURE_0
26 HAVING COUNT(*) = 1
27 )
28 ) ABC WHERE COL1 >= 0 AND COL1 <=449;
450 rows selected.
Elapsed: 00:00:01.90
Execution Plan
----------------------------------------------------------
Plan hash value: 3115757351
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1243 | 28M| 717K (1)| 02:23:29 |
|* 1 | VIEW | | 1243 | 28M| 717K (1)| 02:23:29 |
| 2 | COUNT | | | | | |
|* 3 | HASH JOIN | | 1243 | 30M| 717K (1)| 02:23:29 |
| 4 | VIEW | VW_NSO_1 | 1686K| 3219M| 6274 (2)| 00:01:16 |
|* 5 | FILTER | | | | | |
| 6 | HASH GROUP BY | | 1 | 3219M| 6274 (2)| 00:01:16 |
| 7 | TABLE ACCESS FULL| TAB3 | 1686K| 3219M| 6196 (1)| 00:01:15 |
| 8 | TABLE ACCESS FULL | TAB3 | 1686K| 37G| 6211 (1)| 00:01:15 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COL1">=0 AND "COL1"<=449)
3 - access("MEASURE_0"="MEASURE_0")
5 - filter(COUNT(*)=1)
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
354 recursive calls
0 db block gets
46518 consistent gets
45122 physical reads
0 redo size
43972 bytes sent via SQL*Net to client
715 bytes received via SQL*Net from client
31 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
450 rows processed
The query takes up 1.90seconds. If I run the query again it takes 1.66seconds. Why does it take more time in the first run?
查询需要1.90秒。如果我再次运行查询,则需要1.66秒。为什么第一次运行需要更多时间?
In order to speed it up I created indexes on the two columns used in the query.
为了加快速度,我在查询中使用的两列上创建了索引。
SQL> create index ind_tab3_orgid on tab3(COL1);
Index created.
Elapsed: 00:00:01.68
SQL> create index ind_tab3_msr_0 on tab3(measure_0);
Index created.
Elapsed: 00:00:01.83
When I fired the query after this for the first time it took a whooping 21seconds to come back. Whereas subsequent runs bought it to 2.9seconds. Why does oracle take so much time in the first run, is it warming up or something.. baffles me!
当我第一次在此之后触发查询时,它花了惊人的21秒才返回。而随后的运行将其购买到2.9秒。为什么oracle在第一次运行时要花这么多时间,是热身还是什么..让我感到困惑!
This is the plan when it takes 2.9 seconds-
这是需要2.9秒时的计划——
450 rows selected.
Elapsed: 00:00:02.92
Execution Plan
----------------------------------------------------------
Plan hash value: 240271480
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1243 | 28M| 711K (1)| 02:22:15 |
|* 1 | VIEW | | 1243 | 28M| 711K (1)| 02:22:15 |
| 2 | COUNT | | | | | |
| 3 | NESTED LOOPS | | | | | |
| 4 | NESTED LOOPS | | 1243 | 30M| 711K (1)| 02:22:15 |
| 5 | VIEW | VW_NSO_1 | 1686K| 3219M| 6274 (2)| 00:01:16 |
|* 6 | FILTER | | | | | |
| 7 | HASH GROUP BY | | 1 | 3219M| 6274 (2)| 00:01:16 |
| 8 | TABLE ACCESS FULL | TAB3 | 1686K| 3219M| 6196 (1)| 00:01:15 |
|* 9 | INDEX RANGE SCAN | IND_TAB3_MSR_0 | 1243 | | 2 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID| TAB3 | 1243 | 28M| 44 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COL1">=0 AND "COL1"<=449)
6 - filter(COUNT(*)=1)
9 - access("MEASURE_0"="MEASURE_0")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
660054 consistent gets
22561 physical reads
0 redo size
44358 bytes sent via SQL*Net to client
715 bytes received via SQL*Net from client
31 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
450 rows processed
I was expecting the time to be lower than when the table was non indexed. Why does the indexed version of the table takes more time to fetch the results than the non indexed version? If I am not wrong it is the TABLE ACCESS BY INDEX ROWID that is taking up time. Can I enforce oracle to use TABLE ACCESS FULL?
我期望时间比表未编入索引时要短。为什么表的索引版本比非索引版本需要更多时间来获取结果?如果我没记错的话,是 TABLE ACCESS BY INDEX ROWID 占用了时间。我可以强制 oracle 使用 TABLE ACCESS FULL 吗?
I then gathered statistics on the table so that CBO improves the plan with compute option. So now the statistics would be accurate.
然后我收集了表格上的统计数据,以便 CBO 使用计算选项改进计划。所以现在统计是准确的。
SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'EQUBE67DP', tabname=>'TAB3',estimate_percent=>null,cascade=>true);
PL/SQL procedure successfully completed.
Elapsed: 00:01:02.47
SQL> set autotrace off;
SQL> select COLUMN_NAME,NUM_DISTINCT,SAMPLE_SIZE,HISTOGRAM,LAST_ANALYZED from dba_tab_cols where table_name = 'TAB3' ;
COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE HISTOGRAM LAST_ANALYZED
------------------------------ ------------ ----------- --------------- ---------
COL1 1502257 1502257 NONE 27-JUN-12
COL2 0 NONE 27-JUN-12
COL3 1 1502257 NONE 27-JUN-12
COL4 0 NONE 27-JUN-12
COL5 1502257 1502257 NONE 27-JUN-12
COL6 1502257 1502257 NONE 27-JUN-12
MEASURE_0 405609 1502257 HEIGHT BALANCED 27-JUN-12
MEASURE_1 128570 1502257 NONE 27-JUN-12
MEASURE_2 1502257 1502257 NONE 27-JUN-12
MEASURE_3 185657 1502257 NONE 27-JUN-12
MEASURE_4 901 1502257 NONE 27-JUN-12
MEASURE_5 17 1502257 NONE 27-JUN-12
MEASURE_6 2202 1502257 NONE 27-JUN-12
MEASURE_7 2193 1502257 NONE 27-JUN-12
MEASURE_8 21 1502257 NONE 27-JUN-12
MEASURE_9 27263 1502257 NONE 27-JUN-12
I again ran the query
我再次运行查询
450 rows selected.
Elapsed: 00:00:02.95
Execution Plan
----------------------------------------------------------
Plan hash value: 240271480
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31M| 718G| 8046 (2)| 00:01:37 |
|* 1 | VIEW | | 31M| 718G| 8046 (2)| 00:01:37 |
| 2 | COUNT | | | | | |
| 3 | NESTED LOOPS | | | | | |
| 4 | NESTED LOOPS | | 31M| 62G| 8046 (2)| 00:01:37 |
| 5 | VIEW | VW_NSO_1 | 4057 | 7931K| 6263 (2)| 00:01:16 |
|* 6 | FILTER | | | | | |
| 7 | HASH GROUP BY | | 1 | 20285 | 6263 (2)| 00:01:16 |
| 8 | TABLE ACCESS FULL | TAB3 | 1502K| 7335K| 6193 (1)| 00:01:15 |
|* 9 | INDEX RANGE SCAN | IND_TAB3_MSR_0 | 4 | | 2 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID| TAB3 | 779K| 75M| 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COL1">=0 AND "COL1"<=449)
6 - filter(COUNT(*)=1)
9 - access("MEASURE_0"="MEASURE_0")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
660054 consistent gets
22561 physical reads
0 redo size
44358 bytes sent via SQL*Net to client
715 bytes received via SQL*Net from client
31 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
450 rows processed
This time the query came back in 2.9seconds (sometimes it took 3.9seconds too).
这次查询在2.9秒内返回(有时也需要3.9秒)。
My goal is to minimize the query execution time as much as possible. But after adding indexes or after computing statistics the query time just kept increasing. Why is this happening and how can I improve even by keeping the indexes?
我的目标是尽可能减少查询执行时间。但是在添加索引或计算统计后,查询时间一直在增加。为什么会发生这种情况,即使保留索引,我该如何改进?
回答by Vincent Malgrat
First of all, let me quote Tom Kyte:
首先,让我引用Tom Kyte 的话:
just keep saying to yourself over and over
"full scans are not evil, indexes are not good"
"full scans are not evil, indexes are not good"
"full scans are not evil, indexes are not good"
"full scans are not evil, indexes are not good"
"full scans are not evil, indexes are not good"
"full scans are not evil, indexes are not good"
一遍又一遍地对自己说
“全扫描不邪恶,索引不好”
“全扫描不邪恶,索引不好”
“全扫描不邪恶,索引不好”
“全扫描不邪恶,索引不好”
“完整扫描不邪恶,索引不好”
“全扫描不邪恶,索引不好”
Indexes will not alwaysimprove performance, they are not the magical silver bullet (as if such a thing ever existed :)
索引不会总是提高性能,它们不是神奇的灵丹妙药(好像曾经存在过这样的东西:)
Now you're asking whyit takes longer with your index. The answer is pretty simple:
现在您要问为什么索引需要更长的时间。答案很简单:
- with the full table scan: 46518consistent gets
- with your index: 660054consistent gets
- 全表扫描: 46518一致获取
- 与您的索引:660054一致获取
In other words: Oracle perform more read operations with your index than with the full table scan. This happens because:
换句话说:与全表扫描相比,Oracle 对您的索引执行更多的读取操作。发生这种情况是因为:
- FULL TABLE SCAN reads are bulk operations (many blocks at a time) and are therefore an efficient way to read lots of data
- sometimes when you read from an index you end up reading the exact same data block more than once.
- FULL TABLE SCAN 读取是批量操作(一次多个块),因此是读取大量数据的有效方法
- 有时,当您从索引中读取数据时,您最终会多次读取完全相同的数据块。
As to why the optimizer chose to use this obviously non-efficient index, this is because even with esimate_percent=100
and full histograms (which you have collected on the MEASURE_0
column), some data distribution still can not be reliably expressed by the simple analysis of the optimizer. In particular, cross-column and cross-table dependency is not well-understood by the analyzer. This leads to wrong estimates, which lead to poor plan choice.
至于为什么优化器选择使用这个明显效率不高的索引,这是因为即使有esimate_percent=100
完整的直方图(你已经在MEASURE_0
列上收集了),优化器的简单分析仍然不能可靠地表达一些数据分布。特别是,分析器不能很好地理解跨列和跨表依赖性。这会导致错误的估计,从而导致糟糕的计划选择。
Edit:it seems the working hypothesis of the CBO are not working at all for this self-join (your last query expects 31 million rows whereas only 450 are selected !). This is quite puzzling since the table has only 1.5 M rows. What version of Oracle are you using?
编辑:似乎 CBO 的工作假设对于这种自联接根本不起作用(您的最后一个查询预计有 3100 万行,而只选择了 450 行!)。这很令人费解,因为该表只有 150 万行。您使用的是哪个版本的 Oracle?
I think you will find that you can remove the self-join and therefore improve query performance with analytics:
我认为您会发现您可以删除自联接,从而通过分析提高查询性能:
SELECT * FROM (
SELECT (ROWNUM -1) AS COL1, ABC.*
FROM (
SELECT
ft.COL1 AS OLD_COL1,
[...],
COUNT(*) OVER (PARTITION BY MEASURE_O) nb_0
FROM tab3 ft
) ABC
WHERE nb_0 = 1
AND ROWNUM - 1 <= 449
) v
WHERE COL1 >= 0;
You were also asking why the first time a query is run takes more time in general. This is because there are caches at work. On the database level there is the SGA where all blocks are first copied from disk and then can be read multiple times (the first time a block is queried is always a physical read). Then some systems also have an independent system cache that will return data faster if it has been read recently.
您还问为什么第一次运行查询通常需要更多时间。这是因为有缓存在工作。在数据库级别有 SGA,其中所有块首先从磁盘复制,然后可以多次读取(第一次查询块总是物理读取)。然后一些系统还有一个独立的系统缓存,如果最近读取过数据,它会更快地返回数据。
For further reading:
进一步阅读:
- an explication of a data discrepancy that can lead to wrong estimatesand a solution involving SQL profiles.
- 对可能导致错误估计的数据差异的解释以及涉及 SQL 配置文件的解决方案。
回答by Tebbe
How does this code perform?
这段代码如何执行?
SELECT ROWNUM - 1 AS col1
, ft.col1 AS old_col1
, ft.col2
, ft.col3
, ft.col4
, ft.col5
, ft.col6
, ft.measure_0
, ft.measure_1
, ft.measure_2
, ft.measure_3
, ft.measure_4
, ft.measure_5
, ft.measure_6
, ft.measure_7
, ft.measure_8
, ft.measure_9
FROM tab3 ft
WHERE NOT EXISTS (SELECT NULL
FROM tab3 ft_prime
WHERE ft_prime.measure_0 = ft.measure_0
AND ft_prime.ROWID <> ft.ROWID)
AND ROWNUM <= 450;