oracle 从表中选择 MIN 和 MAX 比预期的要慢
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12565790/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Selecting both MIN and MAX From the Table is slower than expected
提问by RGO
I have a table MYTABLE
with a date column SDATE
which is the primary key of the table and has a unique index on it.
我有一个MYTABLE
带有日期列的表,SDATE
它是表的主键,并且上面有一个唯一索引。
When I run this query:
当我运行此查询时:
SELECT MIN(SDATE) FROM MYTABLE
it gives answer instantly. The same happens for:
它立即给出答案。同样的情况发生在:
SELECT MAX(SDATE) FROM MYTABLE
But, if I query both together:
但是,如果我同时查询:
SELECT MIN(SDATE), MAX(SDATE) FROM MYTABLE
it takes much more time to execute. I analyzed the plans and found when one of min or max is queried, it uses INDEX FULL SCAN(MIN/MAX) but when both are queried at the same time, it does a FULL TABLE SCAN.
执行需要更多时间。我分析了计划,发现当查询 min 或 max 之一时,它使用 INDEX FULL SCAN(MIN/MAX) 但是当同时查询两者时,它会执行 FULL TABLE SCAN。
why?
为什么?
Test Data:
测试数据:
version 11g
版本 11g
create table MYTABLE
(
SDATE DATE not null,
CELL VARCHAR2(10),
data NUMBER
)
tablespace CHIPS
pctfree 10
pctused 40
initrans 1
maxtrans 255
storage
(
initial 64K
minextents 1
maxextents unlimited
);
alter table MYTABLE
add constraint PK_SDATE primary key (SDATE)
using index
tablespace SYSTEM
pctfree 10
initrans 2
maxtrans 255
storage
(
initial 64K
minextents 1
maxextents unlimited
);
Load table:
负载表:
declare
i integer;
begin
for i in 0 .. 100000 loop
insert into MYTABLE(sdate, cell, data)
values(sysdate - i/24, 'T' || i, i);
commit;
end loop;
end;
Gather stats:
收集统计数据:
begin
dbms_stats.gather_table_stats(tabname => 'MYTABLE', ownname => 'SYS');
end;
Plan1:
计划1:
Plan2:
计划2:
采纳答案by avi
The Index Full Scan can only visit one side of the index. When you are doing
索引全扫描只能访问索引的一侧。当你在做
SELECT MIN(SDATE), MAX(SDATE) FROM MYTABLE
you are requesting to visit 2 sides. Therefore, if you want both the minimum and the maximum column value, an Index Full Scan is not viable.
您要求访问 2 个方面。因此,如果您想要最小和最大列值,索引全扫描是不可行的。
A more detailed analyze you can find here.
您可以在此处找到更详细的分析。
回答by Vincent Malgrat
The explain plans are different: a single MIN
or MAX
will produce a INDEX FULL SCAN (MIN/MAX)
whereas when the two are present you will get an INDEX FULL SCAN
or a FAST FULL INDEX SCAN
.
解释计划是不同的:一个MIN
or MAX
will 产生 aINDEX FULL SCAN (MIN/MAX)
而当两者存在时你会得到 an INDEX FULL SCAN
or a FAST FULL INDEX SCAN
。
To understand the difference, we have to look for a description of a FULL INDEX SCAN
:
要了解差异,我们必须寻找对 a 的描述FULL INDEX SCAN
:
In a full index scan, the database reads the entire index in order.
在全索引扫描中,数据库按顺序读取整个索引。
In other words, if the index is on a VARCHAR2
field, Oracle will fetch the first block of the index that would contain for example all entries that start with the letter "A" and will read block by block all entries alphabetically until the last entry ("A" to "Z"). Oracle can process in this way because the entries are sorted in a binary tree index.
换句话说,如果索引在一个VARCHAR2
字段上,Oracle 将获取索引的第一个块,该块将包含例如以字母“A”开头的所有条目,并将按字母顺序逐块读取所有条目,直到最后一个条目( “A”到“Z”)。Oracle 可以这样处理,因为条目是按二叉树索引排序的。
When you see INDEX FULL SCAN (MIN/MAX)
in an explain plan, that is the result of an optimization that uses the fact that since the entries are sorted, you can stop after having read the first one if you are only interested by the MIN
. If you are interested in the MAX
only, Oracle can use the same access path but this time starting by the last entry and reading backwards from "Z" to "A".
当您INDEX FULL SCAN (MIN/MAX)
在解释计划中看到时,这是使用以下事实的优化结果:由于条目已排序,如果您只对MIN
. 如果您对MAX
唯一感兴趣,Oracle 可以使用相同的访问路径,但这次从最后一个条目开始并从“Z”向后读取“A”。
As of now, a FULL INDEX SCAN
has only one direction (either forward or backward) and can not start from both ends simultaneously, this is why when you ask for both the min and the max, you get a less efficient access method.
到目前为止, aFULL INDEX SCAN
只有一个方向(向前或向后)并且不能同时从两端开始,这就是为什么当您同时要求最小值和最大值时,您得到的访问方法效率较低。
As suggested by other answers, if the query needs critical efficiency, you could run your own optimization by searching for the min and the max in two distinct queries.
正如其他答案所建议的那样,如果查询需要关键效率,您可以通过在两个不同的查询中搜索最小值和最大值来运行自己的优化。
回答by planben
Try not selecting both edges of the index in one query , Accessing the query in a different way like this :
尽量不要在一个查询中选择索引的两个边缘,以不同的方式访问查询,如下所示:
select max_date, min_date
from (select max(sdate) max_date from mytable),
(select min(sdate) min_date from mytable)
will cause the optimizer to access the index in INDEX_FULL_SCAN(MIN/MAX) in nested loops (in our case , twice).
将导致优化器在嵌套循环中访问 INDEX_FULL_SCAN(MIN/MAX) 中的索引(在我们的例子中,两次)。
回答by Ben
I have to say that I do not see the same behaviour in 11.2
我不得不说我在 11.2 中没有看到相同的行为
If I set up a test case as follows and updated from 10k to 1m rows in response to Vincent's comment
如果我设置一个测试用例如下并从 10k 行更新到 1m 行以响应 Vincent 的评论
set linesize 130
set pagesize 0
create table mytable ( sdate date );
Table created.
insert into mytable
select sysdate - level
from dual
connect by level <= 1000000;
commit;
1000000 rows created.
Commit complete.
alter table mytable add constraint pk_mytable primary key ( sdate ) using index;
Table altered.
begin
dbms_stats.gather_table_stats( user, 'MYTABLE'
, estimate_percent => 100
, cascade => true
);
end;
/
PL/SQL procedure successfully completed.
Then, executing your queries I get almostidentical looking explain plans (notice the different types of INDEX FULL SCAN)
然后,执行您的查询,我得到几乎相同的解释计划(注意不同类型的 INDEX FULL SCAN)
explain plan for select min(sdate) from mytable;
Explained.
select * from table(dbms_xplan.display);
Plan hash value: 3877058912
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
| 2 | INDEX FULL SCAN (MIN/MAX)| PK_MYTABLE | 1 | 8 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
9 rows selected.
explain plan for select min(sdate), max(sdate) from mytable;
Explained.
select * from table(dbms_xplan.display);
Plan hash value: 3812733167
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8 | 252 (0)| 00:00:04 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
| 2 | INDEX FULL SCAN| PK_MYTABLE | 1000K| 7812K| 252 (0)| 00:00:04 |
-------------------------------------------------------------------------------
9 rows selected.
To quote from a previous answer of mine:
引用我之前的一个回答:
The two most common reasons for a query not using indexes are:
- It's quicker to do a full table scan.
- Poor statistics.
查询不使用索引的两个最常见原因是:
- 进行全表扫描会更快。
- 统计数据不佳。
Unless there's something you're not posting in the question my immediate answer would be that you have not collected statistics on this table, you haven't collected them with a high enough estimate percent or you've used analyze
, which will nothelp the Cost Based Optimizer, unlike dbms_stats.gather_table_stats
.
除非您没有在问题中发布某些内容,否则我的直接回答是您没有在此表上收集统计信息,您没有以足够高的估计百分比收集它们,或者您已经使用过analyze
,这对成本没有帮助基于优化器,不像dbms_stats.gather_table_stats
.
To quote from the documentation on analyze
:
从文档中引用analyze
:
For the collection of most statistics, use the DBMS_STATS package, which lets you collect statistics in parallel, collect global statistics for partitioned objects, and fine tune your statistics collection in other ways. See Oracle Database PL/SQL Packages and Types Reference for more information on the DBMS_STATS package.
Use the ANALYZE statement (rather than DBMS_STATS) for statistics collection not related to the cost-based optimizer:
对于大多数统计信息的收集,请使用 DBMS_STATS 包,它允许您并行收集统计信息、收集分区对象的全局统计信息,并以其他方式微调您的统计信息收集。有关 DBMS_STATS 包的更多信息,请参阅 Oracle 数据库 PL/SQL 包和类型参考。
使用 ANALYZE 语句(而不是 DBMS_STATS)进行与基于成本的优化器无关的统计信息收集: