Oracle 10g - 优化 WHERE IS NOT NULL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/721556/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 01:48:02  来源:igfitidea点击:

Oracle 10g - optimize WHERE IS NOT NULL

oracleoptimizationnull

提问by Jim Evans

We have Oracle 10g and we need to query 1 table (no joins) and filter out rows where 1 of the columns is null. When we do this - WHERE OurColumn IS NOT NULL - we get a full table scan on a very large table - BAD BAD BAD. The column has an index on it but it gets ignored in this instance. Are there any solutions to this?

我们有 Oracle 10g,我们需要查询 1 个表(无连接)并过滤掉其中 1 列为空的行。当我们这样做时 - WHERE OurColumn IS NOT NULL - 我们对一个非常大的表进行全表扫描 - BAD BAD BAD。该列上有一个索引,但在此实例中被忽略。有什么解决办法吗?

Thanks

谢谢

回答by Quassnoi

The optimizer thinks that the full table scan will be better.

优化器认为全表扫描会更好。

If there are just a few NULLrows, the optimizer is right.

如果只有几NULL行,优化器是正确的。

If you are absolutely sure that the index access will be faster (that is, you have more than 75%rows with col1 IS NULL), then hint your query:

如果您绝对确定索引访问会更快(也就是说,您的75%行数多于col1 IS NULL),则提示您的查询:

SELECT  /*+ INDEX (t index_name_on_col1) */
        *
FROM    mytable t
WHERE   col1 IS NOT NULL

Why 75%?

为什么75%

Because using INDEX SCANto retrieve values not covered by the index implies a hidden join on ROWID, which costs about 4times as much as table scan.

因为INDEX SCAN用于检索索引未涵盖的值意味着在 上进行隐藏连接ROWID,其成本大约4是表扫描的两倍。

If the index range includes more than 25%of rows, the table scan is usually faster.

如果索引范围包含多个25%行,则表扫描通常会更快。

As mentioned by Tony Andrews, clustering factor is more accurate method to measure this value, but 25%is still a good rule of thumb.

正如 所提到的Tony Andrews,聚类因子是衡量这个值的更准确的方法,但25%仍然是一个很好的经验法则。

回答by Tony Andrews

The optimiser will make its decision based on the relative cost of the full table scan and using the index. This mainly comes down to how many blocks will have to be read to satisfy the query. The 25%/75% rule of thumb mentioned in another answer is simplistic: in some cases a full table scan will make sense even to get 1% of the rows - i.e. if those rows happen to be spread around many blocks.

优化器将根据全表扫描和使用索引的相对成本做出决定。这主要归结为必须读取多少块才能满足查询。另一个答案中提到的 25%/75% 经验法则很简单:在某些情况下,全表扫描甚至可以获取 1% 的行 - 即,如果这些行恰好分布在许多块周围。

For example, consider this table:

例如,考虑这个表:

SQL> create table t1 as select object_id, object_name from all_objects;

Table created.
SQL> alter table t1 modify object_id null;

Table altered.

SQL> update t1 set object_id = null
  2  where mod(object_id,100) != 0
  3  /

84558 rows updated.

SQL> analyze table t1 compute statistics;

Table analyzed.

SQL> select count(*) from t1 where object_id is not null;

  COUNT(*)
----------
       861    

As you can see, only approximately 1% of the rows in T1 have a non-null object_id. But due to the way I built the table, these 861 rows will be spread more or less evenly around the table. Therefore, the query:

如您所见,T1 中只有大约 1% 的行具有非空 object_id。但是由于我构建表格的方式,这 861 行将或多或少均匀地分布在表格周围。因此,查询:

select * from t1 where object_id is not null;

is likely to visit almost every block in T1 to get data, even if the optimiser used the index. It makes sense then to dispense with the index and go for a full table scan!

很可能访问 T1 中的几乎每个块来获取数据,即使优化器使用了索引。然后省去索引并进行全表扫描是有道理的!

A key statistic to help identify this situation is the index clustering factor:

帮助识别这种情况的一个关键统计数据是索引集群因子:

SQL> select clustering_factor from user_indexes where index_name='T1_IDX';

CLUSTERING_FACTOR
-----------------
              460

This value 460 is quite high (compared to the 861 rows in the index), and suggests that a full table scan will be used. See this DBAZine article on clustering factors.

这个值 460 相当高(与索引中的 861 行相比),并表明将使用全表扫描。请参阅这篇关于聚类因子的 DBAZine 文章

回答by Otávio Décio

If you are doing a select *, then it would make sense to do a table scan rather than using the index. If you know which columns you are interested in, you could create a covered index with those colums plus the one you are applying the IS NOT NULL condition.

如果您正在执行 select *,那么进行表扫描而不是使用索引是有意义的。如果您知道您对哪些列感兴趣,则可以使用这些列加上您正在应用 IS NOT NULL 条件的列创建一个覆盖索引。

回答by Nick Pierpoint

It can depend on the type of index you have on the table.

它可能取决于您在表上的索引类型。

Most B-tree indexes do notstore null entries. Bitmap indexes dostore null entries.

大多数 B 树索引存储空条目。位图索引确实存储空条目。

So, if you have:

所以,如果你有:

select * from mytable where mycolumn is null

select * from mytable 其中 mycolumn 为空

and you have a standard B-tree index on mycolumn, then the query can'tuse the index as the "null" isn't in the index.

并且您在 上有一个标准的 B 树索引mycolumn,则查询不能使用该索引,因为“null”不在索引中。

(If the index is against multiple columns, and one of the indexed columns is not null then there will be an entry in the index.)

(如果索引针对多个列,并且索引列之一不为空,则索引中将有一个条目。)

回答by Sebastian Good

It's also worth checking whether Oracle's statistics on the table are up to date. It may not know that a full table scan will be slower.

还值得检查 Oracle 在该表上的统计信息是否是最新的。它可能不知道全表扫描会更慢。

回答by tpdi

Create an index on that column.

在该列上创建索引。

To make sure the index is used, it should be on the index and other columns in the where.

为了确保索引被使用,它应该在索引和其他列的位置上。

ocdecio answered:

ocdecio 回答:

If you are doing a select *, then it would make sense to do a table scan rather than using the index.

如果您正在执行 select *,那么进行表扫描而不是使用索引是有意义的。

That's not strictly true; an index will be used if there is an index that fits your where clause, and the query optimizer decides using that index would be faster than doing a table scan. If there is no index, or no suitable index, only then must a table scan be done.

严格来说,这不是真的。如果存在适合您的 where 子句的索引,则将使用索引,并且查询优化器决定使用该索引比执行表扫描更快。如果没有索引,或没有合适的索引,则必须进行表扫描。

回答by riprop

Oracle database don't index null values at all in regular (b-tree) indexes, so it can't use it nor you can't force oracle database to use it.

Oracle 数据库在常规(b 树)索引中根本不索引空值,因此它不能使用它,也不能强制 oracle 数据库使用它。

BR

BR

回答by Nicolas de Fontenay

Using hints should be done only as a work around rather than a solution.

使用提示只能作为一种解决方法而不是解决方案。

As mentioned in other answers, the null value is not available in B-TREE indexes.

如其他答案中所述,空值在 B-TREE 索引中不可用。

Since you know that you have mostly null values in this column, would you be able to replace the null value by a range for instance.

由于您知道此列中大部分为空值,因此您能否将空值替换为一个范围。

That really depends on your column and the nature of your data but typically, if your column is a date type for instance:

这实际上取决于您的列和数据的性质,但通常,如果您的列是日期类型,例如:

where mydatecolumn is not nullCan be translated in a rule saying: I want all rows which have a date.

where mydatecolumn is not null可以翻译成一条规则说:我想要所有有日期的行。

Then you can most definitely do this: where mydatecolumn <=sysdate (in oracle)

那么你绝对可以这样做: where mydatecolumn <=sysdate (in oracle)

This will return all rows with a date and ommit null values while taking advantage of the index on that column without using any hints.

这将返回带有日期的所有行并忽略空值,同时利用该列上的索引而不使用任何提示。

回答by Slump

See http://www.oracloid.com/2006/05/using-index-for-is-null/

http://www.oracroid.com/2006/05/using-index-for-is-null/

If your index is on one single field, it will NOT be used. Try to add a dummy field or a constant in the index:

如果您的索引位于单个字段上,则不会使用它。尝试在索引中添加一个虚拟字段或常量:

create index tind on t(field_to_index, 1);