Oracle - 了解 no_index 提示
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4750597/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Oracle - Understanding the no_index hint
提问by dscl
I'm trying to understand how no_index actually speeds up a query and haven't been able to find documentation online to explain it.
我试图了解 no_index 实际上如何加快查询速度,并且无法在线找到文档来解释它。
For example I have this query that ran extremely slow
例如我有这个查询运行速度非常慢
select *
from <tablename>
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString' and
Action_='_someAction_' and
Timestamp_ >= trunc(sysdate - 2)
And one of our DBAs was able to speed it up significantly by doing this
我们的一位 DBA 能够通过这样做显着加快速度
select /*+ NO_INDEX(TAB_000000000019) */ *
from <tablename>
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString' and
Action_='_someAction_' and
Timestamp_ >= trunc(sysdate - 2)
And I can't figure out why? I would like to figure out why this works so I can see if I can apply it to another query (this one a join) to speed it up because it's taking even longer to run.
我不明白为什么?我想弄清楚为什么会这样,所以我可以看看我是否可以将它应用到另一个查询(这个是一个连接)来加速它,因为它需要更长的时间来运行。
Thanks!
谢谢!
** Update ** Here's what I know about the table in the example.
** 更新 ** 以下是我对示例中表格的了解。
- It's a 'partitioned table'
- TAB_000000000019 is the table not a column in it
- field1 is indexed
- 这是一个“分区表”
- TAB_000000000019 是表而不是其中的列
- field1 被索引
回答by skaffman
Oracle's optimizer makes judgements on how best to run a query, and to do this it uses a large number of statistics gathered about the tables and indexes. Based on these stats, it decides whether or not to use an index, or to just do a table scan, for example.
Oracle 的优化器判断如何最好地运行查询,为此它使用收集到的关于表和索引的大量统计信息。例如,根据这些统计信息,它决定是否使用索引,或者只进行表扫描。
Critically, these stats are not automatically up-to-date, because they can be very expensive to gather. In cases where the stats are notup to date, the optimizer can make the "wrong" decision, and perhaps use an index when it would actually be faster to do a table scan.
至关重要的是,这些统计数据不会自动更新,因为收集它们可能非常昂贵。在统计数据不是最新的情况下,优化器可能会做出“错误”的决定,并且可能会在实际上进行表扫描更快时使用索引。
If this is known by the DBA/developer, they can give hints (which is what NO_INDEX
is) to the optimizer, telling it not to use a given index because it's known to slow things down, often due to out-of-date stats.
如果 DBA/开发人员知道这一点,他们可以向NO_INDEX
优化器提供提示(这是什么),告诉它不要使用给定的索引,因为众所周知它会减慢速度,这通常是由于过时的统计数据。
In your example, TAB_000000000019
will refer to an index or a table (I'm guessing an index, since it looks like an auto-generated name).
在您的示例中,TAB_000000000019
将引用索引或表(我猜是索引,因为它看起来像一个自动生成的名称)。
It's a bit of a black art, to be honest, but that's the gist of it, as I understand things.
老实说,这有点像黑色艺术,但据我了解,这就是它的要点。
Disclaimer: I'm not a DBA, but I've dabbled in that area.
免责声明:我不是 DBA,但我涉足该领域。
回答by Dave Costa
Per your update: If field1 is the only indexed field, then the original query was likely doing a fast full scan on that index (i.e. reading through every entry in the index and checking against the filter conditions on field1), then using those results to find the rows in the table and filter on the other conditions. The conditions on field1 are such that an index unique scan or range scan (i.e. looking up specific values or ranges of values in the index) would not be possible.
根据您的更新:如果 field1 是唯一的索引字段,那么原始查询可能会对该索引进行快速完整扫描(即通读索引中的每个条目并检查 field1 上的过滤条件),然后使用这些结果来查找表中的行并过滤其他条件。field1 的条件是索引唯一扫描或范围扫描(即在索引中查找特定值或值范围)将是不可能的。
Likely the optimizer chose this path because there are two filter predicates on field1. The optimizer would calculate estimated selectivity for each of these and then multiply them to determine their combined selectivity. But in many cases this will significantly underestimate the number of rows that will match the condition.
优化器选择这条路径很可能是因为 field1 上有两个过滤谓词。优化器将计算每个这些的估计选择性,然后将它们相乘以确定它们的组合选择性。但在许多情况下,这将大大低估与条件匹配的行数。
The NO_INDEX hint eliminates this option from the optimizer's consideration, so it essentially goes with the plan it thinks is next best -- possibly in this case using partition elimination based on one of the other filter conditions in the query.
NO_INDEX 提示从优化器的考虑中排除了此选项,因此它基本上符合它认为次优的计划——在这种情况下,可能使用基于查询中其他过滤条件之一的分区消除。
回答by René Nyffenegger
Using an index degrades query performance if it results in moredisk IO compared to querying the table with an index.
如果与使用索引查询表相比,使用索引会导致更多的磁盘 IO,那么使用索引会降低查询性能。
This can be demonstrated with a simple table:
这可以用一个简单的表格来证明:
create table tq84_ix_test (
a number(15) primary key,
b varchar2(20),
c number(1)
);
The following block fills 1 Million records into this table. Every 250th record is filled with a rare value
in column b while all the others are filled with frequent value
:
以下块将 100 万条记录填充到此表中。每 250 条记录rare value
在 b 列中填入 a ,而所有其他记录均填入frequent value
:
declare
rows_inserted number := 0;
begin
while rows_inserted < 1000000 loop
if mod(rows_inserted, 250) = 0 then
insert into tq84_ix_test values (
-1 * rows_inserted,
'rare value',
1);
rows_inserted := rows_inserted + 1;
else
begin
insert into tq84_ix_test values (
trunc(dbms_random.value(1, 1e15)),
'frequent value',
trunc(dbms_random.value(0,2))
);
rows_inserted := rows_inserted + 1;
exception when dup_val_on_index then
null;
end;
end if;
end loop;
end;
/
An index is put on the column
一个索引放在列上
create index tq84_index on tq84_ix_test (b);
The same query, but once with index and once without index, differ in performance. Check it out for yourself:
相同的查询,但一次有索引和一次没有索引,性能不同。自己检查一下:
set timing on
select /*+ no_index(tq84_ix_test) */
sum(c)
from
tq84_ix_test
where
b = 'frequent value';
select /*+ index(tq84_ix_test tq84_index) */
sum(c)
from
tq84_ix_test
where
b = 'frequent value';
Why is it? In the case without the index, alldatabase blocks are read, in sequential order. Usually, this is costly and therefore considered bad. In normal situation, with an index, such a "full table scan" can be reduced to reading say 2 to 5 index database blocks plus reading the one database block that contains the record that the index points to. With the example here, it is different altogether: the entire index is read and for (almost) each entry in the index, a database block is read, too. So, not only is the entire table read, but also the index. Note, that this behaviour would differ if c
were also in the index because in that case Oracle could choose to get the value of c
from the index instead of going the detour to the table.
为什么?在没有索引的情况下,按顺序读取所有数据库块。通常,这是昂贵的,因此被认为是不好的。在正常情况下,使用索引,这样的“全表扫描”可以减少到读取 2 到 5 个索引数据库块加上读取包含索引指向的记录的一个数据库块。对于这里的示例,情况完全不同:读取整个索引,并且(几乎)索引中的每个条目也读取一个数据库块。因此,不仅要读取整个表,还要读取索引。请注意,如果c
也在索引中,这种行为会有所不同,因为在这种情况下,Oracle 可以选择c
从索引中获取 的值,而不是绕道而行到表中。
So, to generalize the issue: if the index does not pick few records then it might be beneficial to not use it.
所以,概括一下这个问题:如果索引没有选择很少的记录,那么不使用它可能是有益的。
回答by walid
Something to note about indexes is that they are precomputed values based on the row order and the data in the field. In this specific case you say that field1 is indexed and you are using it in the query as follows:
关于索引需要注意的是,它们是基于行顺序和字段中的数据预先计算的值。在这种特定情况下,您说 field1 已编入索引,并且您在查询中使用它,如下所示:
where field1_ like '%someGenericString%' and
field1_ <> 'someSpecificString'
In the query snippet above the filter is on both a variable piece of data since the percent (%) character cradles the string and then on another specific string. This means that the default Oracle optimization that doesn't use an optimizer hint will try to find the string inside the indexed field first and also find if the data it is a sub-string of the data in the field, then it will check that the data doesn't match another specific string. After the index is checked the other columns are then checked. This is a very slow process if repeated.
在上面的查询片段中,过滤器既针对可变数据片段,因为百分比 (%) 字符包含字符串,然后针对另一个特定字符串。这意味着不使用优化器提示的默认 Oracle 优化将首先尝试在索引字段中查找字符串,并查找该数据是否是该字段中数据的子字符串,然后它会检查数据与另一个特定字符串不匹配。检查索引后,然后检查其他列。如果重复,这是一个非常缓慢的过程。
The NO_INDEX hint proposed by the DBA removes the optimizer's preference to use an index and will likely allow the optimizer to choose the faster comparisons first and not necessarily force index comparison first and then compare other columns.
DBA 提出的 NO_INDEX 提示消除了优化器对使用索引的偏好,并且可能允许优化器首先选择更快的比较,而不必先强制进行索引比较,然后再比较其他列。
The following is slow because it compares the string and its sub-strings:
以下很慢,因为它比较了字符串及其子字符串:
field1_ like '%someGenericString%'
While the following is faster because it is specific:
虽然以下更快,因为它是特定的:
field1_ like 'someSpecificString'
So the reason to use the NO_INDEX hint is if you have comparisons on the index that slow things down. If the index field is compared against more specific data then the index comparison is usuallyfaster.
所以使用 NO_INDEX 提示的原因是如果您对索引进行了比较会减慢速度。如果将索引字段与更具体的数据进行比较,则索引比较通常会更快。
I say usuallybecause when the indexed field contains more redundant data like in the example @Atish mentions above, it will have to go through a long list of comparison negatives before a positive comparison is returned. Hints produce varying results because both the database design and the data in the tables affect how fast a query performs. So in order to apply hints you need to know if the individual comparisons you hint to the optimizer will be faster on your data set. There are no shortcuts in this process. Applying hints should happen after proper SQL queries have been written because hints should be based on the real data.
我说通常是因为当索引字段包含更多冗余数据时,如上面@Atish 提到的示例,在返回正比较之前,它必须经过一长串比较否定。提示会产生不同的结果,因为数据库设计和表中的数据都会影响查询的执行速度。因此,为了应用提示,您需要知道您向优化器提示的单个比较在您的数据集上是否会更快。这个过程没有捷径可走。应用提示应该在编写正确的 SQL 查询之后发生,因为提示应该基于真实数据。
Check out this hints reference: http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm
查看此提示参考:http: //docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm
回答by Rahul Saini
I had read somewhere that using a % in front of query like '%someGenericString%' will lead to Oracle ignoring the INDEX on that field. Maybe that explains why the query is running slow.
我在某处读到过在查询前使用 % 像 '%someGenericString%' 会导致 Oracle 忽略该字段上的 INDEX。也许这解释了为什么查询运行缓慢。
回答by Atish
To add to what Rene' and Dave have said, this is what I have actually observed in a production situation:
补充一下 Rene' 和 Dave 所说的,这是我在生产环境中实际观察到的:
If the condition(s) on the indexed field returns too many matches, Oracle is better off doing a Full Table Scan.
如果索引字段上的条件返回太多匹配项,Oracle 最好进行全表扫描。
We had a report program querying a very large indexed table - the index was on a region code and the query specified the exact region code, so Oracle CBO uses the index.
我们有一个报表程序查询一个非常大的索引表 - 索引位于区域代码上,查询指定了确切的区域代码,因此 Oracle CBO 使用该索引。
Unfortunately, one specific region code accounted for 90% of the tables entries.
不幸的是,一个特定的区域代码占了表格条目的 90%。
As long as the report was run for one of the other (minor) region codes, it completed in less than 30 minutes, but for the major region code it took many hours.
只要针对其他(次要)区域代码之一运行报告,它就会在不到 30 分钟的时间内完成,但对于主要区域代码则需要很多小时。
Adding a hint to the SQL to force a full table scan solved the problem.
向SQL添加提示以强制进行全表扫描解决了问题。
Hope this helps.
希望这可以帮助。