SQL 最喜欢的性能调优技巧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18783/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Favourite performance tuning tricks
提问by Seibar
When you have a query or stored procedure that needs performance tuning, what are some of the first things you try?
当您有一个需要性能调优的查询或存储过程时,您首先要尝试哪些操作?
回答by AJ.
Here is the handy-dandy list of things I always give to someone asking me about optimisation.
We mainly use Sybase, but most of the advice will apply across the board.
这是我总是给问我优化问题的人提供的方便花花公子的清单。
我们主要使用 Sybase,但大部分建议都适用。
SQL Server, for example, comes with a host of performance monitoring / tuning bits, but if you don't have anything like that (and maybe even if you do) then I would consider the following...
例如,SQL Server 带有许多性能监控/调整位,但是如果您没有类似的东西(即使您有),那么我会考虑以下内容...
99% of problemsI have seen are caused by putting too many tables in a join. The fix for this is to do half the join (with some of the tables) and cache the results in a temporary table. Then do the rest of the query joining on that temporary table.
我见过的99% 的问题都是由于在 join 中放入太多表造成的。对此的解决方法是执行一半的连接(使用某些表)并将结果缓存在临时表中。然后在该临时表上执行其余的查询连接。
Query Optimisation Checklist
查询优化清单
- Run UPDATE STATISTICS on the underlying tables
- Many systems run this as a scheduled weekly job
- Delete records from underlying tables (possibly archive the deleted records)
- Consider doing this automatically once a day or once a week.
- Rebuild Indexes
- Rebuild Tables (bcp data out/in)
- Dump / Reload the database (drastic, but might fix corruption)
- Build new, more appropriate index
- Run DBCC to see if there is possible corruption in the database
- Locks / Deadlocks
- Ensure no other processes running in database
- Especially DBCC
- Are you using row or page level locking?
- Lock the tables exclusively before starting the query
- Check that all processes are accessing tables in the same order
- Ensure no other processes running in database
- Are indices being used appropriately?
- Joins will only use index if both expressions are exactly the same data type
- Index will only be used if the first field(s) on the index are matched in the query
- Are clustered indices used where appropriate?
- range data
- WHERE field between value1 and value2
- Small Joins are Nice Joins
- By default the optimiser will only consider the tables 4 at a time.
- This means that in joins with more than 4 tables, it has a good chance of choosing a non-optimal query plan
- Break up the Join
- Can you break up the join?
- Pre-select foreign keys into a temporary table
- Do half the join and put results in a temporary table
- Are you using the right kind of temporary table?
#temp
tables may perform much better than@table
variables with large volumes (thousands of rows).
- Maintain Summary Tables
- Build with triggers on the underlying tables
- Build daily / hourly / etc.
- Build ad-hoc
- Build incrementally or teardown / rebuild
- See what the query plan is with SET SHOWPLAN ON
- See what's actually happenning with SET STATS IO ON
- Force an index using the pragma: (index: myindex)
- Force the table order using SET FORCEPLAN ON
- Parameter Sniffing:
- Break Stored Procedure into 2
- call proc2 from proc1
- allows optimiser to choose index in proc2 if @parameter has been changed by proc1
- Can you improve your hardware?
- What time are you running? Is there a quieter time?
- Is Replication Server (or other non-stop process) running? Can you suspend it? Run it eg. hourly?
- 在基础表上运行 UPDATE STATISTICS
- 许多系统将此作为计划的每周作业运行
- 从底层表中删除记录(可能归档已删除的记录)
- 考虑每天或每周自动执行一次此操作。
- 重建索引
- 重建表(bcp 数据输出/输入)
- 转储/重新加载数据库(剧烈,但可能会修复损坏)
- 建立新的、更合适的索引
- 运行 DBCC 以查看数据库中是否可能存在损坏
- 锁/死锁
- 确保没有其他进程在数据库中运行
- 特别是DBCC
- 您使用的是行级锁定还是页级锁定?
- 在开始查询之前以独占方式锁定表
- 检查所有进程是否以相同的顺序访问表
- 确保没有其他进程在数据库中运行
- 是否正确使用了索引?
- 仅当两个表达式的数据类型完全相同时,联接才会使用索引
- 仅当索引上的第一个字段在查询中匹配时才会使用索引
- 是否在适当的地方使用了聚集索引?
- 范围数据
- value1 和 value2 之间的 WHERE 字段
- 小连接是好的连接
- 默认情况下,优化器一次只考虑表 4。
- 这意味着在超过 4 个表的连接中,它很有可能选择一个非最优的查询计划
- 分手加入
- 你能打破连接吗?
- 将外键预选到临时表中
- 做一半的连接并将结果放在临时表中
- 您使用的是正确类型的临时表吗?
#temp
表的性能可能比@table
大容量(数千行)的变量好得多。
- 维护汇总表
- 在底层表上使用触发器构建
- 每天/每小时/等构建。
- 构建临时
- 增量构建或拆卸/重建
- 使用 SET SHOWPLAN ON 查看查询计划是什么
- 使用 SET STATS IO ON 查看实际发生的情况
- 使用 pragma 强制索引:(index: myindex)
- 使用 SET FORCEPLAN ON 强制表顺序
- 参数嗅探:
- 将存储过程分解为 2
- 从 proc1 调用 proc2
- 如果 @parameter 已被 proc1 更改,则允许优化器在 proc2 中选择索引
- 你能改进你的硬件吗?
- 你几点跑?有更安静的时间吗?
- Replication Server(或其他不间断进程)是否正在运行?你能暂停吗?运行它,例如。每小时?
回答by Mark Brackett
- Have a pretty good idea of the optimal path of running the query in your head.
- Check the query plan - always.
- Turn on STATS, so that you can examine both IO and CPU performance. Focus on driving those numbers down, not necessarily the query time (as that can be influenced by other activity, cache, etc.).
- Look for large numbers of rows coming into an operator, but small numbers coming out. Usually, an index would help by limiting the number of rows coming in (which saves disk reads).
- Focus on the largest cost subtree first. Changing that subtree can often change the entire query plan.
- Common problems I've seen are:
- If there's a lot of joins, sometimes Sql Server will choose to expand the joins, and then apply WHERE clauses. You can usually fix this by moving the WHERE conditions into the JOIN clause, or a derived table with the conditions inlined. Views can cause the same problems.
- Suboptimal joins (LOOP vs HASH vs MERGE). My rule of thumb is to use a LOOP join when the top row has very few rows compared to the bottom, a MERGE when the sets are roughly equal and ordered, and a HASH for everything else. Adding a join hint will let you test your theory.
- Parameter sniffing. If you ran the stored proc with unrealistic values at first (say, for testing), then the cached query plan may be suboptimal for your production values. Running again WITH RECOMPILE should verify this. For some stored procs, especially those that deal with varying sized ranges (say, all dates between today and yesterday - which would entail an INDEX SEEK - or, all dates between last year and this year - which would be better off with an INDEX SCAN) you may have to run it WITH RECOMPILE every time.
- Bad indentation...Okay, so Sql Server doesn't have an issue with this - but I sure find it impossible to understand a query until I've fixed up the formatting.
- 对运行查询的最佳路径有一个很好的想法。
- 检查查询计划 - 始终。
- 打开 STATS,以便您可以检查 IO 和 CPU 性能。专注于降低这些数字,不一定是查询时间(因为这可能会受到其他活动、缓存等的影响)。
- 寻找进入运算符的大量行,但出现少量行。通常,索引会通过限制进入的行数(节省磁盘读取)来提供帮助。
- 首先关注最大的成本子树。更改该子树通常会更改整个查询计划。
- 我见过的常见问题是:
- 如果连接很多,有时Sql Server 会选择扩展连接,然后应用WHERE 子句。您通常可以通过将 WHERE 条件移动到 JOIN 子句或带有内联条件的派生表来解决此问题。视图可能会导致相同的问题。
- 次优连接(LOOP vs HASH vs MERGE)。我的经验法则是当顶行与底行相比行数很少时使用 LOOP 连接,当集合大致相等且有序时使用 MERGE,其他所有行使用 HASH。添加连接提示将让您测试您的理论。
- 参数嗅探。如果您首先使用不切实际的值运行存储过程(例如,用于测试),那么缓存的查询计划可能不是您的生产值的最佳选择。再次运行 WITH RECOMPILE 应该可以验证这一点。对于某些存储过程,尤其是那些处理不同大小范围的过程(例如,今天和昨天之间的所有日期 - 这将需要一个 INDEX SEEK - 或者,去年和今年之间的所有日期 - 最好使用 INDEX SCAN ) 您可能每次都必须使用 WITH RECOMPILE 运行它。
- 错误的缩进......好吧,所以 Sql Server 没有这个问题 - 但我确实发现在我修复格式之前不可能理解查询。
回答by jason saldo
Slightly off topic but if you have control over these issues...
High level and High Impact.
有点离题,但如果你能控制这些问题......
高水平和高影响力。
- For high IO environments make sure your disks are for either RAID 10 or RAID 0+1 or some nested implementation of raid 1 and raid 0.
- Don't use drives less than 1500K.
- Make sure your disks are only used for your Database. IE no logging no OS.
- Turn off auto grow or similar feature. Let the database use all storage that is anticipated. Not necessarily what is currently being used.
- design your schema and indexes for the type queries.
- if it's a log type table (insert only) and must be in the DB don't index it.
- if your doing allot of reporting (complex selects with many joins) then you should look at creating a data warehouse with a star or snowflake schema.
- Don't be afraid of replicating data in exchange for performance!
- 对于高 IO 环境,请确保您的磁盘适用于 RAID 10 或 RAID 0+1 或raid 1 和raid 0 的某些嵌套实现。
- 不要使用小于 1500K 的驱动器。
- 确保您的磁盘仅用于您的数据库。IE 没有日志没有操作系统。
- 关闭自动增长或类似功能。让数据库使用所有预期的存储。不一定是当前正在使用的。
- 为类型查询设计架构和索引。
- 如果它是一个日志类型表(仅插入)并且必须在数据库中,则不要对其进行索引。
- 如果您正在分配报告(具有许多连接的复杂选择),那么您应该考虑使用星型或雪花模式创建数据仓库。
- 不要害怕复制数据以换取性能!
回答by Will SQL for Food
CREATE INDEX
CREATE INDEX
Assure there are indexes available for your WHERE
and JOIN
clauses. This will speed data access greatly.
确保您的WHERE
andJOIN
子句有可用的索引。这将大大加快数据访问速度。
If your environment is a data mart or warehouse,indexes should abound for almost any conceivable query.
如果您的环境是数据集市或仓库,那么几乎所有可以想象的查询都应该有大量的索引。
In a transactional environment, the number of indexes should be lower and their definitions more strategic so that index maintenance doesn't drag down resources. (Index maintenance is when the leaves of an index must be changed to reflect a change in the underlying table, as with INSERT, UPDATE,
and DELETE
operations.)
在事务环境中,索引的数量应该更少,并且它们的定义更具有战略意义,以便索引维护不会拖累资源。(索引维护是指必须更改索引的叶子以反映基础表中的更改,如INSERT, UPDATE,
和DELETE
操作。)
Also, be mindful of the order of fields in the index - the more selective (higher cardinality) a field, the earlier in the index it should appear. For example, say you're querying for used automobiles:
此外,请注意索引中字段的顺序 - 字段的选择性越高(基数越高),它应该在索引中出现的越早。例如,假设您要查询二手汽车:
SELECT i.make, i.model, i.price
FROM dbo.inventory i
WHERE i.color = 'red'
AND i.price BETWEEN 15000 AND 18000
Price generally has higher cardinality. There may be only a few dozen colors available, but quite possibly thousands of different asking prices.
价格通常具有较高的基数。可能只有几十种颜色可供选择,但很可能有数千种不同的要价。
Of these index choices, idx01
provides the faster path to satisfy the query:
在这些索引选择中,idx01
提供了满足查询的更快路径:
CREATE INDEX idx01 ON dbo.inventory (price, color)
CREATE INDEX idx02 ON dbo.inventory (color, price)
This is because fewer cars will satisfy the price point than the color choice, giving the query engine far less data to analyze.
这是因为与颜色选择相比,满足价格点的汽车更少,因此查询引擎要分析的数据要少得多。
I've been known to have two very similar indexes differing only in the field order to speed queries (firstname, lastname) in one and (lastname, firstname) in the other.
我已经知道有两个非常相似的索引,仅在字段顺序上有所不同,以加快查询速度(名字、姓氏)和(姓氏、名字)。
回答by jandersson
A trick I recently learned is that SQL Server can update local variables as well as fields, in an update statement.
我最近学到的一个技巧是 SQL Server 可以在更新语句中更新局部变量和字段。
UPDATE table
SET @variable = column = @variable + otherColumn
Or the more readable version:
或者更易读的版本:
UPDATE table
SET
@variable = @variable + otherColumn,
column = @variable
I've used this to replace complicated cursors/joins when implementing recursive calculations, and also gained a lot in performance.
在实现递归计算时,我用它来替换复杂的游标/连接,并且在性能上也获得了很多。
Here's details and example code that made fantastic improvements in performance: http://geekswithblogs.net/Rhames/archive/2008/10/28/calculating-running-totals-in-sql-server-2005---the-optimal.aspx
以下是对性能进行了极大改进的详细信息和示例代码:http: //geekswithblogs.net/Rhames/archive/2008/10/28/calculating-running-totals-in-sql-server-2005---the-optimal。 aspx
回答by AlexCuse
@Terrapin there are a few other differences between isnull and coalesce that are worth mentioning (besides ANSI compliance, which is a big one for me).
@Terrapin isnull 和 coalesce 之间还有其他一些值得一提的差异(除了 ANSI 合规性,这对我来说很重要)。
回答by DavidM
Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible and try to eliminate file sorts. High Performance MySQL: Optimization, Backups, Replication, and Moreis a great book on this topic as is MySQL Performance Blog.
假设这里是 MySQL,使用 EXPLAIN 来找出查询发生了什么,确保尽可能有效地使用索引并尝试消除文件排序。高性能 MySQL:优化、备份、复制等是一本关于这个主题的好书,MySQL 性能博客也是。
回答by Ryan
Sometimes in SQL Server if you use an OR in a where clause it will really Hyman with performance. Instead of using the OR just do two selects and union them together. You get the same results at 1000x the speed.
有时在 SQL Server 中,如果您在 where 子句中使用 OR,它确实会提高性能。而不是使用 OR 只是做两个选择并将它们联合在一起。以 1000 倍的速度获得相同的结果。
回答by John Christensen
I'll generally start with the joins - I'll knock each one of them out of the query one at a time and re-run the query to get an idea if there's a particular join I'm having a problem with.
我通常会从连接开始 - 我将一次从查询中删除每个连接,然后重新运行查询以了解是否存在我遇到问题的特定连接。
回答by Mike
Look at the where clause - verify use of indexes / verify nothing silly is being done
查看 where 子句 - 验证索引的使用/验证没有做任何愚蠢的事情
where SomeComplicatedFunctionOf(table.Column) = @param --silly