Oracle 日期索引很慢。没有它,查询速度快 300 倍

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22552299/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 02:17:25  来源:igfitidea点击:

Oracle Date index is slow. Query is 300 times faster without it

performanceoracledateindexingquery-optimization

提问by urbanmojo

I had an Oracle query as below that took 10 minutes or longer to run:

我有一个如下的 Oracle 查询需要 10 分钟或更长时间才能运行:

  select
      r.range_text as duration_range,
      nvl(count(c.call_duration),0) as calls,
      nvl(SUM(call_duration),0) as total_duration
      from
      call_duration_ranges r
      left join
      big_table c
      on c.call_duration BETWEEN r.range_lbound AND r.range_ubound
 and c.aaep_src = 'MAIN_SOURCE'
 and c.calltimestamp_local  >= to_date('01-02-2014 00:00:00' ,'dd-MM-yyyy HH24:mi:ss')
 AND c.calltimestamp_local <=  to_date('28-02-2014 23:59:59','dd-MM-yyyy HH24:mi:ss')
 and          c.destinationnumber LIKE substr( 'abc:[email protected]:5060;user=phone',1,8) || '%'    
 group by
      r.range_text
 order by
      r.range_text

If I changed the date part of the query to:

如果我将查询的日期部分更改为:

 (c.calltimestamp_local+0)  >= to_date('01-02-2014 00:00:00' ,'dd-MM-yyyy HH24:mi:ss')
 (AND c.calltimestamp_local+0) <=  to_date('28-02-2014 23:59:59','dd-MM-yyyy HH24:mi:ss')

It runs in 2 seconds. I did this based on another post to avoid using the date index. Seems counter intuitive though--the index slowing things down so much.

它在 2 秒内运行。我是根据另一篇文章这样做的,以避免使用日期索引。虽然似乎违反直觉——索引使事情变得如此缓慢。

Ran the explain plan and it seems identical between the new and updated query. Only difference is the the MERGE JOIN operation is 16,269 bytes in the old query and 1,218 bytes in the new query. Actually cardinality is higher in the old query as well. And I actually don't see an "INDEX" operation on the old or new query in the explain plan, just for the index on the destinationnumber field.

运行解释计划,新查询和更新查询之间似乎相同。唯一的区别是 MERGE JOIN 操作在旧查询中为 16,269 字节,在新查询中为 1,218 字节。实际上,旧查询中的基数也更高。而且我实际上在解释计划中没有看到对旧查询或新查询的“INDEX”操作,只是针对 destinationnumber 字段上的索引。

So why is the index slowing down the query so much? What can I do to the index--don't think using the "+0" is the best solution going forward...

那么为什么索引会如此减慢查询速度呢?我可以对索引做些什么——不要认为使用“+0”是最好的解决方案......

Querying for two days of data, suppressing use of destinationnumber index:

查询两天的数据,禁止使用destinationnumber索引:

0   SELECT STATEMENT            ALL_ROWS    329382  1218    14
1   SORT    GROUP BY            329382  1218    14
2   MERGE JOIN  OUTER           329381  1218    14
3   SORT    JOIN            4   308 14
4   TABLE ACCESS    FULL    CALL_DURATION_RANGES    ANALYZED    3   308 14
5   FILTER                      
6   SORT    JOIN            329377  65  1
7   TABLE ACCESS    BY GLOBAL INDEX ROWID   BIG_TABLE   ANALYZED    329376  65  1
8   INDEX   RANGE SCAN  IDX_CDR_CALLTIMESTAMP_LOCAL ANALYZED    1104        342104

Querying for 2 days using destinationnumber index:

使用 destinationnumber 索引查询 2 天:

0   SELECT STATEMENT            ALL_ROWS    11  1218    14
1   SORT    GROUP BY            11  1218    14
2   MERGE JOIN  OUTER           10  1218    14
3   SORT    JOIN            4   308 14
4   TABLE ACCESS    FULL    CALL_DURATION_RANGES    ANALYZED    3   308 14
5   FILTER                      
6   SORT    JOIN            6   65  1
7   TABLE ACCESS    BY GLOBAL INDEX ROWID   BIG_TABLE   ANALYZED    5   65  1
8   INDEX   RANGE SCAN  IDX_DESTINATIONNUMBER_PART  ANALYZED    4       4

Querying for one month, suppressing destinationnumber index--full scan:

查询一个月,抑制destinationnumber索引--全扫描:

0   SELECT STATEMENT            ALL_ROWS    824174  1218    14
1   SORT    GROUP BY            824174  1218    14
2   MERGE JOIN  OUTER           824173  1218    14
3   SORT    JOIN            4   308 14
4   TABLE ACCESS    FULL    CALL_DURATION_RANGES    ANALYZED    3   308 14
5   FILTER                      
6   SORT    JOIN            824169  65  1
7   PARTITION RANGE ALL         824168  65  1
8   TABLE ACCESS    FULL    BIG_TABLE   ANALYZED    824168  65  1

回答by APC

Seems counter intuitive though--the index slowing things down so much.

虽然似乎违反直觉——索引使事情变得如此缓慢。

Counter-intuitive only if you don't understand how indexes work.

仅当您不了解索引的工作原理时才违反直觉。

Indexes are good for retrieving individual rows. They are not suited to retrieving large numbers of records. You haven't bothered to provide any metrics but it seems likely your query is touching a large number of rows. In which case a full table scan or other set=based operation will be more much efficient.

索引适用于检索单个行。它们不适合检索大量记录。您没有费心提供任何指标,但您的查询似乎涉及大量行。在这种情况下,全表扫描或其他 set=based 操作会更有效率。



Tuning date range queries is tricky, because it's very hard for the database to know how many records lie between the two bounds, no matter how up-to-date our statistics are. (Even more tricky to tune when the date bounds can vary - one day is a different matter from one month or one year.) So often we need to help the optimizer by using our knowledge of our data.

调整日期范围查询很棘手,因为无论我们的统计数据有多最新,数据库都很难知道这两个边界之间有多少记录。(当日期范围发生变化时,调整更加棘手 - 一天与一个月或一年不同。)因此,我们经常需要利用我们对数据的了解来帮助优化器。

don't think using the "+0" is the best solution going forward...

不要认为使用“+0”是最好的解决方案......

Why not? People have been using that technique to avoid using an index in a specific query for literally decades.

为什么不?几十年来,人们一直在使用这种技术来避免在特定查询中使用索引。

However, there are more modern solutions. The undocumented cardinality hint is one:

但是,还有更现代的解决方案。未记录的基数提示是一个:

 select /*+ cardinality(big_table,10000) */ 

... should be enough to dissuade the optimizer from using an index - provided you have accurate statistics gathered for allthe tables in the query.

... 应该足以阻止优化器使用索引 - 前提是您已为查询中的所有表收集了准确的统计信息。

Alternatively you can force the optimizer to do a full table scan with ...

或者,您可以强制优化器使用 ...

 select /*+ full(big_table) */ 

Anyway, there's nothing you can do to the index to change the way databases work. You could make things faster with partitioning, but I would guess if your organisation had bought the Partitioning optionyou'd be using it already.

无论如何,您无法对索引做任何事情来改变数据库的工作方式。您可以通过分区加快速度,但我猜如果您的组织购买了分区选项,您就会已经在使用它。

回答by Drunix

These are the reasons using an index slows down a query:

这些是使用索引减慢查询速度的原因:

  1. A full tablescan would be faster. This happens if a substantial fraction of rows has to be retrieved. The concrete numbers depend on various factors, but as a rule of thumb in common situations using an index is slower if you retrieve more than 10-20% of your rows.

  2. Using another index would be even better, because fewer rows are left after the first stage. Using a certain index on a table usually means that other indexes cannot be used.

  1. 全表扫描会更快。如果必须检索大部分行,就会发生这种情况。具体数字取决于各种因素,但根据经验,在常见情况下,如果检索超过 10-20% 的行,使用索引的速度会较慢。

  2. 使用另一个索引会更好,因为在第一阶段之后剩下的行更少。在表上使用某个索引通常意味着不能使用其他索引。

Now it is the optimizers job to decide which variant is the best. To perform this task, he has to guess (among other things) how much rows are left after applying certain filtering clauses. This estimate is based on the tables statistics, and is usually quite ok. It even takes into account skewed data, but it might be off if either your statitiscs are outdated or you have a rather uncommon distribution of data. For example, if you computed your statistics before the data of february was inserted in your example, the optimizer might wrongly conclude that there are only few (if any) rows left after applaying the date range filter.

现在是优化器的工作来决定哪个变体是最好的。要执行此任务,他必须猜测(除其他外)在应用某些过滤子句后还剩下多少行。这个估计是基于表格统计的,通常是可以的。它甚至考虑了倾斜的数据,但如果您的统计数据已经过时或者您的数据分布相当不常见,则它可能会关闭。例如,如果您在示例中插入 2 月的数据之前计算统计数据,优化器可能会错误地得出结论,应用日期范围过滤器后只剩下很少(如果有)行。

Using combined indexes on several columns might also be an option dependent on your data.

在多个列上使用组合索引也可能是一个取决于您的数据的选项。

Another note on the "skewed data issue": There are cases the optimizer detects skewed data in column A if you have an index on cloumn A but not if you only have a combined index on columns A and B, because the combination might make the distribution more even. This is one of the few cases where an index on A,B does not make an index on A redundant.

关于“偏斜数据问题”的另一个注意事项:如果您在 A 列上有索引,优化器会在 A 列中检测到偏斜数据,但如果只有 A 列和 B 列上有组合索引,则不会,因为组合可能会使分布更均匀。这是 A、B 上的索引不会使 A 上的索引变得多余的少数情况之一。

APCs answer show how to use hints to direct the optimizer in the right direction if it still produces wrong plans even with right statistics.

APC 的回答显示了如果即使使用正确的统计信息仍然产生错误的计划,如何使用提示将优化器引导到正确的方向。