Oracle LEADING 提示——为什么需要这样做?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2321698/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 20:03:03  来源:igfitidea点击:

Oracle LEADING hint -- why is this required?

sqloraclequery-hints

提问by aw crud

Suddenly (but unfortunately I don't know when "suddenly" was; I know it ran fine at some point in the past) one of my queries started taking 7+ seconds instead of milliseconds to execute. I have 1 local table and 3 tables being accessed via a DB link. The 3 remote tables are joined together, and one of them is joined with my local table.

突然(但不幸的是,我不知道“突然”是什么时候;我知道它在过去的某个时候运行良好)我的一个查询开始需要 7 秒以上而不是毫秒来执行。我有 1 个本地表和 3 个通过数据库链接访问的表。3 个远程表连接在一起,其中一个与我的本地表连接。

The local table's where clause only takes a few millis to execute on its own, and only returns a few (10's or 100's at the most) records. The 3 remote tables have many hundreds of thousands, possibly millions, of records between them, and if I join them appropriately I get tens or hundreds of thousands of records.

本地表的 where 子句自己执行只需要几毫秒,并且只返回少数(最多 10 条或 100 条)记录。3 个远程表之间有数十万甚至数百万条记录,如果我适当地加入它们,我会得到数万或数十万条记录。

I am only joining with the remote tables so that I can pull out a few pieces of data related to each record in my local table.

我只是加入远程表,以便我可以提取与本地表中每条记录相关的几条数据。

What appears to be happening, however, is that Oracle joins the remote tables together first and then my local table to that mess at the end. This is always going to be a bad idea, especially given the data set that exists right now, so I added a /*+ LEADING(local_tab remote_tab_1) */hint to my query and it now returns in milliseconds.

然而,似乎正在发生的事情是 Oracle 首先将远程表连接在一起,然后我的本地表最后连接到那个乱七八糟的地方。这总是一个坏主意,特别是考虑到现在存在的数据集,所以我/*+ LEADING(local_tab remote_tab_1) */向我的查询添加了一个提示,它现在以毫秒为单位返回。

I compared the explain plans and they are almost identical, save for a single BUFFER SORTon one of the remote tables.

我比较了解释计划,它们几乎相同,除了一个BUFFER SORT远程表上的单个计划。

I'm wondering what might cause Oracle to approach this the wrong way? Is it an index issue? What should I be looking for?

我想知道什么可能导致 Oracle 以错误的方式处理这个问题?是索引问题吗?我应该寻找什么?

采纳答案by Jens Schauder

When choosing an execution plan, oracle estimates costs for the different plans. One crucial information for that estimate is the amount of rows will get returned from a step of the execution plan. Oracle tries to estimate those using 'statistics', i.e. information about how many rows a table contains, how many different values a column contains; How evenly these values are distributed.

在选择执行计划时,oracle 会估算不同计划的成本。该估计的一个关键信息是从执行计划的一个步骤中返回的行数。Oracle 尝试使用“统计”来估计这些数据,即有关表包含多少行、一列包含多少不同值的信息;这些值的分布有多均匀。

These statistics are just that statistics, and they might be wrong, which is one of the most important reasons for misjudgments of the oracle optimizer.

这些统计只是那个统计,有可能是错误的,这是造成oracle优化器误判的最重要原因之一。

So gathering new statistics as described in a comment might help. Have a look at the documentation on that dbms_stats package. There are many different ways to call that package.

因此,按照评论中的描述收集新的统计数据可能会有所帮助。查看有关该 dbms_stats 包的文档。有许多不同的方法可以调用该包。

回答by Jeffrey Kemp

A common problem I've come across is a query that joins many tables, where the joins form a chain from one end to another, e.g.:

我遇到的一个常见问题是连接许多表的查询,其中连接形成从一端到另一端的链,例如:

SELECT *
FROM   tableA, tableB, tableC, tableD, tableE
WHERE  tableA.ID0 = :bind1
AND    tableA.ID1 = tableB.ID1
AND    tableB.ID2 = tableC.ID2
AND    tableC.ID3 = tableD.ID3
AND    tableD.ID4 = tableE.ID4
AND    tableE.ID5 = :bind2;

Notice how the optimiser might choose to drive the query from tableA (e.g. if the index on ID0 is nicely selective) or from tableE (if the index on tableE.ID5 is more selective).

请注意优化器如何选择从 tableA(例如,如果 ID0 上的索引具有很好的选择性)或从 tableE(如果 tableE.ID5 上的索引更具选择性)驱动查询。

The statistics on the tables might cause the choice between these two plans to balance on a knife-edge; one day it's working fine (driving from tableA), next day new stats are gathered and all of a sudden the alternative plan driving from tableE has a lower cost and is chosen.

表格上的统计数据可能会导致这两个计划之间的选择在刀刃上保持平衡;有一天它工作正常(从 tableA 驱动),第二天收集新的统计数据,突然间从 tableE 驱动的替代计划成本更低,并被选中。

In this circumstance, adding a LEADING hint is one wayto nudge it back to the original plan (i.e. drive from tableA) without dictating too much to the optimiser (i.e. it doesn't force the optimiser to choose any particular join methods).

在这种情况下,添加 LEADING 提示是一种将其推回原始计划(即从 tableA 驱动)而不对优化器进行过多指示(即它不强制优化器选择任何特定连接方法)的方法。

回答by Adam Musch

You're doing distributed query optimization, and that's a tricky beast. It could be that the your table's statistics are current, but now the tables at the remote system are out-of-whack or have changed. Or the remote system added/removed/modified indexes, and that broke your plan. (This is an excellent reason to consider replication -- so you can control indexes and statistics against it.)

你正在做分布式查询优化,这是一个棘手的野兽。可能是您的表的统计信息是最新的,但现在远程系统上的表已不正常或已更改。或者远程系统添加/删除/修改了索引,这破坏了您的计划。(这是考虑复制的一个很好的理由——这样你就可以控制它的索引和统计信息。)

That said, Oracle's estimate of cardinality is a primary driver in execution plan. A 10053 trace analysis (Jonathan Lewis' Cost-Based Oracle Fundamentals book has wonderful examples from 8i to 10.1) can help shed light on why your statement's now broken and how the LEADINGhint fixes it.

也就是说,Oracle 对基数的估计是执行计划的主要驱动因素。10053 跟踪分析(Jonathan Lewis 的 Cost-Based Oracle Fundamentals 一书中提供了从 8i 到 10.1 的精彩示例)可以帮助阐明为什么您的语句现在已损坏以及LEADING提示如何修复它。

The DRIVING_SITEhint might be a better choice if you know you always want the local tables to be joined first before going after the remote site; it clarifies your intention without driving the plan the way a LEADINGhint would.

DRIVING_SITE提示可能是,如果你知道你总是希望本地表先前的远程站点后会被加入了一个更好的选择; 它阐明了您的意图,而不会像LEADING提示那样推动计划。

回答by Gary Myers

Might not be relevant but I had a similar situation once where the remote table had been replaced by a single-table view. When it was a table the distributed query optimizer 'saw' that it had an index. When it became a view it couldn't see the index anymore and couldn't cost a plan that used an index on the remote object.

可能不相关,但我曾经遇到过类似的情况,其中远程表已被单表视图替换。当它是一个表时,分布式查询优化器“看到”它有一个索引。当它变成一个视图时,它就再也看不到索引了,也不能花费在远程对象上使用索引的计划。

That was a few years ago. I documented my analysis at the time here.

那是几年前的事了。我在这里记录了我当时的分析。

回答by Matthew Eyles

RI,

RI,

It's hard to be sure about the cause of the performance problems without seeing the SQL.

如果不查看 SQL,就很难确定性能问题的原因。

When an Oracle query was performing well before, and suddenly starts performing badly, it is usually related to one of two issues:

当 Oracle 查询之前性能良好,突然开始性能不佳时,通常与以下两个问题之一有关:

A) Statistics are out of date. This is the easiest and quickest thing to check, even if you have a housekeeping batch process that's supposed to take care of it ... always double-check.

B) Data volume / data pattern change.

A) 统计数据已经过时。这是检查的最简单和最快捷的方法,即使您有一个应该处理它的内务批处理过程......总是仔细检查。

B) 数据量/数据模式变化。

In your case, running a distributed query across multiple databases makes it 10x harder for Oracle to manage performance between them. Is it possible to put these tables in one database, perhaps separate schema owners in one database?

在您的情况下,跨多个数据库运行分布式查询会使 Oracle 管理它们之间的性能变得困难 10 倍。是否可以将这些表放在一个数据库中,或者将不同的模式所有者放在一个数据库中?

Hints are notoriously fragile, as Oracle is under no obligations to follow the hint. When the data volume or pattern changes some more, Oracle may just ignore the hint and do what it thinks is best (ie. worst ;-).

提示是出了名的脆弱,因为 Oracle 没有义务遵循提示。当数据量或模式发生更多变化时,Oracle 可能会忽略该提示并执行它认为最好的(即最坏的 ;-)。

If you cannot put these tables all in one database, then I recommend you look to break your query up into two statements:

如果您不能将这些表全部放在一个数据库中,那么我建议您将查询分解为两个语句:

  1. INSERT on sub-SELECT to copy external data to a global temporary table in your current database.
  2. SELECT from the global temporary table to join with your other table.
  1. INSERT on sub-SELECT 将外部数据复制到当前数据库中的全局临时表。
  2. 从全局临时表中 SELECT 以加入您的其他表。

You will have complete control over performance of step 1 above without resorting to hints. This approach typically scales well, providing you take time to do the performance tuning. I've seen this approach solve many complex performance problems.

您将完全控制上述步骤 1 的性能,而无需求助于提示。这种方法通常可以很好地扩展,让您花时间进行性能调整。我已经看到这种方法解决了许多复杂的性能问题。

The overhead for Oracle to create a whole new table, or insert a heap of records, is much smaller than most people expect. Defining a global temporary table further reduces that overhead.

Oracle 创建一个全新的表或插入一堆记录的开销比大多数人预期的要小得多。定义一个全局临时表进一步减少了这种开销。

Matthew

马修