Oracle 如何计算解释计划中的成本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9982339/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 00:42:11  来源:igfitidea点击:

How does Oracle calculate the cost in an explain plan?

performanceoracleoracle10goracle11gquery-tuning

提问by Gaurav Soni

Can anyone explain how the cost is evaluated in an Oracle explain plan? Is there any specific algorithm to determine the cost of the query?

谁能解释如何在 Oracle 解释计划中评估成本?是否有任何特定算法来确定查询的成本?

For example: full table scans have higher cost, index scan lower... How does Oracle evaluate the cases for full table scan, index range scan, etc.?

例如:全表扫描,具有较高的性价比,索引扫描下... Oracle如何评估的情况下 full table scanindex range scan等?

This link is same as what I am asking: Question about Cost in Oracle Explain Plan

此链接与我要问的相同:有关 Oracle 解释计划中的成本的问题

But can anyone explain with an example, we can find the cost by executing explain plan, but how does it work internally?

但是谁能用一个例子来解释,我们可以通过执行找到成本explain plan,但它内部是如何工作的?

回答by Justin Cave

There are many, many specific algorithms for computing the cost. Far more than could realistically be discussed here. Jonathan Lewis has done an admirable job of walking through how the cost-based optimizer decides on the cost of a query in his book Cost-Based Oracle Fundamentals. If you're really interested, that's going to to be the best place to start.

有许多用于计算成本的特定算法。在这里讨论的远不止现实。Jonathan Lewis 在他的书Cost-Based Oracle Fundamentals 中完成了一项令人钦佩的工作,他介绍了基于成本的优化器如何决定查询的成本。如果您真的感兴趣,那将是最好的起点。

It is a fallacy to assume that full table scans will have a higher cost than, say, an index scan. It depends on the optimizer's estimates of the number of rows in the table and the optimizer's estimates of the number of rows the query will return (which, in turn, depends on the optimizer's estimates of the selectivity of the various predicates), the relative cost of a sequential read vs. a serial read, the speed of the processor, the speed of the disk, the probability that blocks will be available in the buffer cache, your database's optimizer settings, your session's optimizer settings, the PARALLELattribute of your tables and indexes, and a whole bunch of other factors (this is why it takes a book to really start to dive into this sort of thing). In general, Oracle will prefer a full table scan if your query is going to return a large fraction of the rows in your table and an index access if your query is going to return a small fraction of the rows in your table. And "small fraction" is generally much smaller than people initially estimate-- if you're returning 20-25% of the rows in a table, for example, you're almost always better off using a full table scan.

假设全表扫描的成本高于索引扫描,这是一种谬论。它取决于优化器对表中行数的估计以及优化器对查询将返回的行数的估计(这反过来又取决于优化器对各种谓词的选择性的估计)、相对成本顺序读取与串行读取的比较、处理器的速度、磁盘的速度、块在缓冲区缓存中可用的概率、数据库的优化器设置、会话的优化器设置、PARALLEL表和索引的属性,以及一大堆其他因素(这就是为什么需要一本书才能真正开始深入研究这类事情的原因)。一般来说,如果您的查询将返回表中的大部分行,Oracle 将更喜欢全表扫描,如果您的查询将返回表中的一小部分行,则 Oracle 将更喜欢索引访问。并且“小部分”通常比人们最初估计的要小得多——例如,如果您要返回表中 20-25% 的行,那么最好使用全表扫描。

If you are trying to use the COSTcolumn in a query plan to determine whether the plan is "good" or "bad", you're probably going down the wrong path. The COSTis only valid if the optimizer's estimates are accurate. But the most common reason that query plans would be incorrect is that the optimizer's estimates are incorrect (statistics are incorrect, Oracle's estimates of selectivity are incorrect, etc.). That means that if you see one plan for a query that has a cost of 6 and a plan for a different version of that query that has a cost of 6 million, it is entirely possible that the plan that has a cost of 6 million is more efficient because the plan with the low cost is incorrectly assuming that some step is going to return 1 row rather than 1 million rows.

如果您尝试使用COST查询计划中的列来确定计划是“好”还是“坏”,那么您可能走错了路。该COST如果优化的估计是准确的才有效。但查询计划不正确的最常见原因是优化器的估计不正确(统计不正确、Oracle 对选择性的估计不正确等)。这意味着,如果您看到一个查询的成本为 6 的计划和该查询的不同版本的成本为 600 万的计划,则成本为 600 万的计划完全有可能是效率更高,因为低成本的计划错误地假设某个步骤将返回 1 行而不是 100 万行。

You are much better served ignoring the COSTcolumn and focusing on the CARDINALITYcolumn. CARDINALITYis the optimizer's estimate of the number of rows that are going to be returned at each step of the plan. CARDINALITYis something you can directly test and compare against. If, for example, you see a step in the plan that involves a full scan of table A with no predicates and you know that A has roughly 100,000 rows, it would be concerning if the optimizer's CARDINALITYestimate was either way too high or way too low. If it was estimating the cardinality to be 100 or 10,000,000 then the optimizer would almost certainly be either picking the table scan in error or feeding that data into a later step where its cost estimate would be grossly incorrect leading it to pick a poor join order or a poor join method. And it would probably indicate that the statistics on table A were incorrect. On the other hand, if you see that the cardinality estimates at each step is reasonably close to reality, there is a very good chance that Oracle has picked a reasonably good plan for the query.

忽略该COST列并专注于该CARDINALITY列会更好。 CARDINALITY是优化器对计划每一步将返回的行数的估计。 CARDINALITY是您可以直接测试和比较的东西。例如,如果您看到计划中的一个步骤涉及对表 A 的无谓词的完整扫描,并且您知道 A 大约有 100,000 行,那么优化器的CARDINALITY估计要么太高要么太低。如果它估计基数为 100 或 10,000,000,那么优化器几乎肯定会错误地选择表扫描或将该数据提供给后面的步骤,在该步骤中其成本估计将严重不正确导致它选择一个糟糕的连接顺序或一个糟糕的连接方法。这可能表明表 A 上的统计数据不正确。另一方面,如果您发现每一步的基数估计值都相当接近实际情况,则 Oracle 很有可能为查询选择了一个相当不错的计划。

回答by Dave Costa

Another place to get started on understanding the CBO's algorithms is this paperby Wolfgang Breitling. Jonathan Lewis's book is more detailed and more recent, but the paper is a good introduction.

了解 CBO 算法的另一个起点是Wolfgang Breitling 的这篇论文。乔纳森刘易斯的书更详细,也更近,但这篇论文是一个很好的介绍。

回答by APC

In the 9i documentationOracle produced an authoratative looking mathematical model for cost:

9i 文档中,Oracle 生成了一个权威的成本数学模型:

Cost =  (#SRds * sreadtim + 
           #MRds * mreadtim +  
           #CPUCycles / cpuspeed ) / sreadtim

where:

  • #SRDs is the number of single block reads
  • #MRDs is the number of multi block reads
  • #CPUCycles is the number of CPU Cycles *)
  • sreadtim is the single block read time
  • mreadtim is the multi block read time
  • cpuspeed is the CPU cycles per second
Cost =  (#SRds * sreadtim + 
           #MRds * mreadtim +  
           #CPUCycles / cpuspeed ) / sreadtim

在哪里:

  • #SRDs 是单块读取次数
  • #MRDs 是多块读取的次数
  • #CPUCycles 是 CPU 周期数 *)
  • sreadtim 是单块读取时间
  • mreadtim 是多块读取时间
  • cpuspeed 是每秒 CPU 周期数

So it gives a good idea of the factors which go into calculating cost. This was why Oracle introduced the capability to gather system statistics: to provide accurate values for CPU speed, etc

因此,它很好地了解了计算成本的因素。这就是 Oracle 引入收集系统统计信息功能的原因:为 CPU 速度等提供准确的值

Now we fast forward to the equivalent 11g documentationand we find the maths has been replaced with a cursory explanation:

现在我们快进到等效的 11g 文档,我们发现数学已被粗略的解释所取代:

"Cost of the operation as estimated by the optimizer's query approach. Cost is not determined for table access operations. The value of this column does not have any particular unit of measurement; it is merely a weighted value used to compare costs of execution plans. The value of this column is a function of the CPU_COST and IO_COST columns."

“优化器的查询方法估计的操作成本。成本不是为表访问操作确定的。该列的值没有任何特定的度量单位;它只是用于比较执行计划成本的加权值。此列的值是 CPU_COST 和 IO_COST 列的函数。”

I think this reflects the fact that costjust isn't a very reliable indicator of execution time. Jonathan Lewis recently posted a pertinent blog piece. He shows two similar looking queries; their explain plans are different but they have identical costs. Nevertheless when it comes to runtime one query performs considerably slower than the other. Read it here.

我认为这反映了一个事实,即cost它不是一个非常可靠的执行时间指标。Jonathan Lewis 最近发布了一篇相关的博客文章。他展示了两个相似的查询;他们的解释计划不同,但成本相同。然而,当涉及到运行时,一个查询的执行速度要比另一个慢得多。 在这里阅读