SQL 选项(重新编译)总是更快;为什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20864934/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
OPTION (RECOMPILE) is Always Faster; Why?
提问by Chad Decker
I encountered an odd situation where appending OPTION (RECOMPILE)
to my query causes it to run in half a second, while omitting it causes the query to take well over five minutes.
我遇到了一个奇怪的情况,附加OPTION (RECOMPILE)
到我的查询会导致它在半秒内运行,而省略它会导致查询花费超过五分钟。
This is the case when the query is executed from Query Analyzer or from my C# program via SqlCommand.ExecuteReader()
. Calling (or not calling) DBCC FREEPROCCACHE
or DBCC dropcleanbuffers
makes no difference; Query results are always returned instantaneously with OPTION (RECOMPILE)
and greater than five minutes without it. The query is always called with the same parameters [for the sake of this test].
当从查询分析器或我的 C# 程序通过SqlCommand.ExecuteReader()
. 打电话(或不打电话)DBCC FREEPROCCACHE
或DBCC dropcleanbuffers
没有区别;查询结果总是在有OPTION (RECOMPILE)
和超过五分钟的情况下立即返回。始终使用相同的参数调用查询 [为了这个测试]。
I'm using SQL Server 2008.
我正在使用 SQL Server 2008。
I'm fairly comfortable with writing SQL but have never used an OPTION
command in a query before and was unfamiliar with the whole concept of plan caches until scanning the posts on this forum. My understanding from the posts is that OPTION (RECOMPILE)
is an expensive operation. It apparently creates a new lookup strategy for the query. So why is it then, that subsequent queries that omit the OPTION (RECOMPILE)
are so slow? Shouldn't the subsequent queries be making use of the lookup strategy that was computed on the previous call which included the recompilation hint?
我对编写 SQL 相当满意,但以前从未OPTION
在查询中使用过命令,并且在扫描此论坛上的帖子之前不熟悉计划缓存的整个概念。我从帖子中了解到这OPTION (RECOMPILE)
是一项昂贵的操作。它显然为查询创建了一个新的查找策略。那么,为什么随后省略 的查询OPTION (RECOMPILE)
如此缓慢?后续查询不应该使用在包含重新编译提示的先前调用中计算的查找策略吗?
Is it highly unusual to have a query that requires a recompilation hint on every single call?
每次调用都需要重新编译提示的查询是否非常不寻常?
Sorry for the entry-level question but I can't really make heads or tails of this.
对不起,入门级的问题,但我真的无法对此做出正面或反面。
UPDATE: I've been asked to post the query...
更新:我被要求发布查询...
select acctNo,min(date) earliestDate
from(
select acctNo,tradeDate as date
from datafeed_trans
where feedid=@feedID and feedDate=@feedDate
union
select acctNo,feedDate as date
from datafeed_money
where feedid=@feedID and feedDate=@feedDate
union
select acctNo,feedDate as date
from datafeed_jnl
where feedid=@feedID and feedDate=@feedDate
)t1
group by t1.acctNo
OPTION(RECOMPILE)
When running the test from Query Analyzer, I prepend the following lines:
从查询分析器运行测试时,我在前面添加了以下几行:
declare @feedID int
select @feedID=20
declare @feedDate datetime
select @feedDate='1/2/2009'
When calling it from my C# program, the parameters are passed in via the SqlCommand.Parameters
property.
从我的 C# 程序调用它时,参数是通过SqlCommand.Parameters
属性传入的。
For the purposes of this discussion, you can assume that the parameters never change so we can rule out sub-optimal parameter smelling as the cause.
出于本次讨论的目的,您可以假设参数永远不会改变,因此我们可以排除次优参数气味作为原因。
采纳答案by Abe Miessler
There are times that using OPTION(RECOMPILE)
makes sense. In my experience the only time this is a viable option is when you are using dynamic SQL. Before you explore whether this makes sense in your situation I would recommend rebuilding your statistics. This can be done by running the following:
有时使用OPTION(RECOMPILE)
是有意义的。根据我的经验,唯一可行的选择是在您使用动态 SQL 时。在您探索这对您的情况是否有意义之前,我建议您重建您的统计数据。这可以通过运行以下命令来完成:
EXEC sp_updatestats
And then recreating your execution plan. This will ensure that when your execution plan is created it will be using the latest information.
然后重新创建您的执行计划。这将确保在创建执行计划时将使用最新信息。
Adding OPTION(RECOMPILE)
rebuilds the execution plan every time that your query executes. I have never heard that described as creates a new lookup strategy
but maybe we are just using different terms for the same thing.
OPTION(RECOMPILE)
每次执行查询时,添加都会重建执行计划。我从来没有听说过这样的描述,creates a new lookup strategy
但也许我们只是对同一件事使用不同的术语。
When a stored procedure is created (I suspect you are calling ad-hoc sql from .NET but if you are using a parameterized query then this ends up being a stored proc call) SQL Server attempts to determine the most effective execution plan for this query based on the data in your database and the parameters passed in (parameter sniffing), and then caches this plan. This means that if you create the query where there are 10 records in your database and then execute it when there are 100,000,000 records the cached execution plan may no longer be the most effective.
创建存储过程时(我怀疑您正在从 .NET 调用 ad-hoc sql,但如果您使用的是参数化查询,那么这最终会成为存储过程调用)SQL Server 尝试确定此查询的最有效执行计划根据你数据库中的数据和传入的参数(参数嗅探),然后缓存这个计划。这意味着,如果您在数据库中有 10 条记录的情况下创建查询,然后在有 100,000,000 条记录时执行该查询,则缓存的执行计划可能不再是最有效的。
In summary - I don't see any reason that OPTION(RECOMPILE)
would be a benefit here. I suspect you just need to update your statistics and your execution plan. Rebuilding statistics can be an essential part of DBA work depending on your situation. If you are still having problems after updating your stats, I would suggest posting both execution plans.
总而言之 - 我看不出有任何理由OPTION(RECOMPILE)
在这里有好处。我怀疑你只需要更新你的统计数据和你的执行计划。根据您的情况,重建统计信息可能是 DBA 工作的重要组成部分。如果您在更新统计信息后仍然遇到问题,我建议您发布两个执行计划。
And to answer your question - yes, I would say it is highly unusual for your best option to be recompiling the execution plan every time you execute the query.
并回答您的问题 - 是的,我会说您每次执行查询时重新编译执行计划的最佳选择是非常不寻常的。
回答by CodeCowboyOrg
Often when there is a drastic difference from run to run of a query I find that it is often one of 5 issues.
通常,当查询的运行与运行之间存在巨大差异时,我发现它通常是 5 个问题之一。
STATISTICS- Statistics are out of date. A database stores statistics on the range and distribution of the types of values in various column on tables and indexes. This helps the query engine to develop a "Plan" of attack for how it will do the query, for example the type of method it will use to match keys between tables using a hash or looking through the entire set. You can call Update Statistics on the entire database or just certain tables or indexes. This slows down the query from one run to another because when statistics are out of date, its likely the query plan is not optimal for the newly inserted or changed data for the same query (explained more later below). It may not be proper to Update Statistics immediately on a Production database as there will be some overhead, slow down and lag depending on the amount of data to sample. You can also choose to use a Full Scan or Sampling to update Statistics. If you look at the Query Plan, you can then also view the statistics on the Indexes in use such using the command DBCC SHOW_STATISTICS (tablename, indexname). This will show you the distribution and ranges of the keys that the query plan is using to base its approach on.
PARAMETER SNIFFING- The query plan that is cached is not optimal for the particular parameters you are passing in, even though the query itself has not changed. For example, if you pass in a parameter which only retrieves 10 out of 1,000,000 rows, then the query plan created may use a Hash Join, however if the parameter you pass in will use 750,000 of the 1,000,000 rows, the plan created may be an index scan or table scan. In such a situation you can tell the SQL statement to use the option OPTION (RECOMPILE)or an SP to use WITH RECOMPILE. To tell the Engine this is a "Single Use Plan" and not to use a Cached Plan which likely does not apply. There is no rule on how to make this decision, it depends on knowing the way the query will be used by users.
INDEXES- Its possible that the query haven't changed, but a change elsewhere such as the removal of a very useful index has slowed down the query.
ROWS CHANGED- The rows you are querying drastically changes from call to call. Usually statistics are automatically updated in these cases. However if you are building dynamic SQL or calling SQL within a tight loop, there is a possibility you are using an outdated Query Plan based on the wrong drastic number of rows or statistics. Again in this case OPTION (RECOMPILE)is useful.
THE LOGICIts the Logic, your query is no longer efficient, it was fine for a small number of rows, but no longer scales. This usually involves more indepth analysis of the Query Plan. For example, you can no longer do things in bulk, but have to Chunk things and do smaller Commits, or your Cross Product was fine for a smaller set but now takes up CPU and Memory as it scales larger, this may also be true for using DISTINCT, you are calling a function for every row, your key matches don't use an index because of CASTING type conversion or NULLS or functions... Too many possibilities here.
统计数据- 统计数据已过时。数据库存储有关表和索引上各个列中值类型的范围和分布的统计信息。这有助于查询引擎针对它将如何执行查询制定攻击“计划”,例如它将使用散列或查看整个集合来匹配表之间的键的方法类型。您可以对整个数据库或某些表或索引调用 Update Statistics。这会减慢查询从一次运行到另一次运行的速度,因为当统计信息过时时,对于同一查询的新插入或更改的数据,查询计划可能不是最佳的(稍后将详细解释)。立即在生产数据库上更新统计数据可能不合适,因为根据要采样的数据量会有一些开销、速度减慢和滞后。您还可以选择使用完全扫描或采样来更新统计信息。如果您查看查询计划,您还可以使用以下命令查看正在使用的索引的统计信息DBCC SHOW_STATISTICS (tablename, indexname)。这将向您显示查询计划用于基于其方法的键的分布和范围。
PARAMETER SNIFFING- 缓存的查询计划对于您传入的特定参数不是最佳的,即使查询本身没有改变。例如,如果您传入的参数仅检索 1,000,000 行中的 10 行,则创建的查询计划可能使用哈希联接,但是如果您传入的参数将使用 1,000,000 行中的 750,000 行,则创建的计划可能是索引扫描或表扫描。在这种情况下,您可以告诉 SQL 语句使用选项OPTION (RECOMPILE)或 SP 使用 WITH RECOMPILE。告诉引擎这是一个“一次性使用计划”,而不是使用可能不适用的缓存计划。没有关于如何做出这个决定的规则,这取决于了解用户将使用查询的方式。
索引- 查询可能没有改变,但其他地方的更改(例如删除非常有用的索引)减慢了查询速度。
ROWS CHANGED- 您查询的行在一次次呼叫之间发生了巨大变化。在这些情况下,通常会自动更新统计信息。但是,如果您正在构建动态 SQL 或在紧密循环中调用 SQL,则有可能您正在使用基于错误的大量行或统计信息的过时查询计划。同样在这种情况下OPTION (RECOMPILE)很有用。
逻辑它的逻辑,您的查询不再有效,它适用于少量行,但不再可扩展。这通常涉及对查询计划进行更深入的分析。例如,您不能再批量处理事情,而必须将事情分块并做更小的提交,或者您的交叉产品适合较小的集合,但现在随着规模的扩大而占用 CPU 和内存,这也可能适用于使用 DISTINCT,你为每一行调用一个函数,你的键匹配不使用索引,因为 CASTING 类型转换或 NULLS 或函数......这里的可能性太多了。
In general when you write a query, you should have some mental picture of roughly how certain data is distributed within your table. A column for example, can have an evenly distributed number of different values, or it can be skewed, 80% of the time have a specific set of values, whether the distribution will varying frequently over time or be fairly static. This will give you a better idea of how to build an efficient query. But also when debugging query performance have a basis for building a hypothesis as to why it is slow or inefficient.
通常,当您编写查询时,您应该大致了解某些数据在您的表中的分布情况。例如,一列可以具有均匀分布的不同值的数量,或者它可以是倾斜的,80% 的时间具有一组特定的值,无论分布会随时间频繁变化还是相当静态。这将使您更好地了解如何构建有效的查询。但在调试查询性能时,也有一个基础来建立一个关于为什么它缓慢或低效的假设。
回答by DWright
To add to the excellent list (given by @CodeCowboyOrg) of situations where OPTION(RECOMPILE) can be very helpful,
要将 OPTION(RECOMPILE) 非常有用的情况添加到优秀列表(由@CodeCowboyOrg 提供)中,
- Table Variables. When you are using table variables, there will not be any pre-built statistics for the table variable, often leading to large differences between estimated and actual rows in the query plan. Using OPTION(RECOMPILE) on queries with table variables allows generation of a query plan that has a much better estimate of the row numbers involved. I had a particularly critical use of a table variable that was unusable, and which I was going to abandon, until I added OPTION(RECOMPILE). The run time went from hours to just a few minutes. That is probably unusual, but in any case, if you are using table variables and working on optimizing, it's well worth seeing whether OPTION(RECOMPILE) makes a difference.
- 表变量。使用表变量时,表变量不会有任何预先构建的统计信息,这通常会导致查询计划中估计行与实际行之间存在较大差异。对带有表变量的查询使用 OPTION(RECOMPILE) 允许生成一个查询计划,该计划对所涉及的行数有更好的估计。我特别重要地使用了一个不可用的表变量,我打算放弃它,直到我添加了 OPTION(RECOMPILE)。运行时间从几小时缩短到几分钟。这可能是不寻常的,但无论如何,如果您正在使用表变量并致力于优化,那么非常值得看看 OPTION(RECOMPILE) 是否有所作为。
回答by Cristian Solervicéns
The very first actions before tunning queries is to defrag/rebuild the indexes and statistics, otherway you're wasting your time.
调整查询之前的第一个操作是对索引和统计信息进行碎片整理/重建,否则就是在浪费时间。
You must check the execution plan to see if it's stable (is the same when you change the parameters), if not, you might have to create a cover index (in this case for each table) (knowing th system you can create one that is usefull for other queries too).
您必须检查执行计划以查看它是否稳定(更改参数时相同),如果不是,您可能必须创建一个覆盖索引(在这种情况下为每个表)(知道系统您可以创建一个对其他查询也很有用)。
as an example : create index idx01_datafeed_trans On datafeed_trans ( feedid, feedDate) INCLUDE( acctNo, tradeDate)
例如:在 datafeed_trans ( feedid, feedDate) INCLUDE( acctNo, tradeDate) 上创建索引 idx01_datafeed_trans
if the plan is stable or you can stabilize it you can execute the sentence with sp_executesql('sql sentence') to save and use a fixed execution plan.
如果计划是稳定的或者你可以稳定它,你可以用 sp_executesql('sql sentence') 执行语句来保存和使用固定的执行计划。
if the plan is unstable you have to use an ad-hoc statement or EXEC('sql sentence') to evaluate and create an execution plan each time. (or a stored procedure "with recompile").
如果计划不稳定,则每次都必须使用临时语句或 EXEC('sql sentence') 来评估和创建执行计划。(或“带重新编译”的存储过程)。
Hope it helps.
希望能帮助到你。
回答by MonkeyPushButton
Necroing this question but there's an explanation that no-one seems to have considered.
解决这个问题,但似乎没有人考虑过一种解释。
STATISTICS - Statistics are not available or misleading
统计数据 - 统计数据不可用或具有误导性
If all of the following are true:
如果以下所有情况都为真:
- The columns feedid and feedDate are likely to be highly correlated (e.g. a feed id is more specific than a feed date and the date parameter is redundant information).
- There is no index with both columns as sequential columns.
- There are no manually created statistics covering both these columns.
- feedid 和 feedDate 列可能高度相关(例如,feed id 比 feed 日期更具体,而 date 参数是冗余信息)。
- 没有将两列作为连续列的索引。
- 没有涵盖这两列的手动创建的统计信息。
Then sql server may be incorrectly assuming that the columns are uncorrelated, leading to lower than expected cardinality estimates for applying both restrictions and a poor execution plan being selected. The fix in this case would be to create a statistics object linking the two columns, which is not an expensive operation.
然后 sql server 可能会错误地假设列不相关,导致应用限制和选择的执行计划都低于预期的基数估计。在这种情况下,解决方法是创建一个链接两列的统计对象,这不是一项昂贵的操作。