优化 SQL Server 上的删除
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/955435/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Optimizing Delete on SQL Server
提问by pomarc
Delete
s on sql server are sometimes slow and I've been often in need to optimize them in order to diminish the needed time.
I've been googleing a bit looking for tips on how to do that, and I've found diverse suggestions.
I'd like to know your favorite and most effective techinques to tame the delete beast, and how and why they work.
Delete
sql server 上的 s 有时很慢,我经常需要优化它们以减少所需的时间。我一直在谷歌上搜索有关如何做到这一点的提示,我发现了各种各样的建议。我想知道您最喜欢和最有效的驯服删除野兽的技术,以及它们的工作方式和原因。
until now:
到目前为止:
be sure foreign keys have indexes
be sure the where conditions are indexed
use of
WITH ROWLOCK
destroy unused indexes, delete, rebuild the indexes
确保外键有索引
确保 where 条件被索引
用于
WITH ROWLOCK
销毁未使用的索引,删除,重建索引
now, your turn.
该你了。
采纳答案by John Sansom
The following article, Fast Ordered Delete Operationsmay be of interest to you.
您可能会对以下文章“快速有序删除操作”感兴趣。
Performing fast SQL Server delete operations
The solution focuses on utilising a view in order to simplify the execution plan produced for a batched delete operation. This is achieved by referencing the given table once, rather than twice which in turn reduces the amount of I/O required.
该解决方案侧重于利用视图来简化为批量删除操作生成的执行计划。这是通过引用给定表一次而不是两次来实现的,这反过来又减少了所需的 I/O 量。
回答by Erich Kitzmueller
I have much more experience with Oracle, but very likely the same applies to SQL Server as well:
我对 Oracle 有更多的经验,但很可能同样适用于 SQL Server:
- when deleting a large number of rows, issue a table lock, so the database doesn't have to do lots of row locks
- if the table you delete from is referenced by other tables, make sure those other tables have indexes on the foreign key column(s) (otherwise the database will do a full table scan for each deleted rowon the other table to ensure that deleting the row doesn't violate the foreign key constraint)
- 删除大量行时,发出表锁,这样数据库就不用做很多行锁了
- 如果您从中删除的表被其他表引用,请确保这些其他表在外键列上有索引(否则数据库将对另一个表上的每个已删除行进行全表扫描以确保删除行不违反外键约束)
回答by quillbreaker
I wonder if it's time for garbage-collecting databases? You mark a row for deletion and the server deletes it later during a sweep. You wouldn't want this for every delete - because sometimes a row must go now - but it would be handy on occasion.
我想知道是不是垃圾收集数据库的时候了?您将一行标记为删除,服务器稍后会在扫描期间将其删除。您不会希望每次删除都这样做 - 因为有时现在必须删除一行 - 但有时它会很方便。
回答by xero
Summary of Answers through 2014-11-05
截至 2014-11-05 的答案摘要
This answer is flagged as community wiki since this is an ever-evolving topic with a lot of nuances, but very few possible answers overall.
这个答案被标记为社区维基,因为这是一个不断发展的主题,有很多细微差别,但总体上可能的答案很少。
The first issue is you must ask yourself what scenario you're optimizing for? This is generally either performance with a single user on the db, or scale with many users on the db. Sometimes the answers are the exact opposite.
第一个问题是你必须问自己你正在优化什么场景?这通常是数据库上单个用户的性能,或者数据库上多个用户的扩展。有时答案恰恰相反。
For single user optimization
单用户优化
- Hint a
TABLELOCK
- Remove indexes not used in the delete then rebuild them afterward
- Batch using something like
SET ROWCOUNT 20000
(or whatever, depending on log space) and loop (perhaps with aWAITFOR DELAY
) until you get rid of it all (@@ROWCOUNT = 0
) - If deleting a large % of table, just make a new one and delete the old table
- Partition the rows to delete, then drop the parition. [Read more...]
- 提示一个
TABLELOCK
- 删除删除中未使用的索引,然后重建它们
- 使用类似
SET ROWCOUNT 20000
(或其他,取决于日志空间)和循环(可能使用 aWAITFOR DELAY
)的东西进行批处理,直到您将其全部删除 (@@ROWCOUNT = 0
) - 如果删除很大的表,只需创建一个新表并删除旧表
- 对要删除的行进行分区,然后删除该分区。[阅读更多...]
For multi user optimization
多用户优化
- Hint row locks
- Use the clustered index
- Design clustered index to minimize page re-organization if large blocks are deleted
- Update "is_deleted" column, then do actual deletion later during a maintenance window
- 提示行锁
- 使用聚集索引
- 设计聚集索引以在删除大块时最小化页面重新组织
- 更新“is_deleted”列,然后在维护窗口期间进行实际删除
For general optimization
一般优化
- Be sure FKs have indexes on their source tables
- Be sure
WHERE
clause has indexes - Identify the rows to delete in the
WHERE
clause with a view or derived table instead of referencing the table directly. [Read more...]
- 确保 FK 在其源表上有索引
- 确保
WHERE
子句有索引 WHERE
使用视图或派生表标识子句中要删除的行,而不是直接引用该表。[阅读更多...]
回答by Christian Hayter
To be honest, deleting a million rows from a table scales just as badly as inserting or updating a million rows. It's the size of the rowset that's the problem, and there's not much you can do about that.
老实说,从表中删除一百万行与插入或更新一百万行一样糟糕。问题在于行集的大小,对此您无能为力。
My suggestions:
我的建议:
- Make sure that the table has a primary key and clustered index (this is vital for all operations).
- Make sure that the clustered index is such that minimal page re-organisation would occur if a large block of rows were to be deleted.
- Make sure that your selection criteria are SARGable.
- Make sure that all your foreign key constraints are currently trusted.
- 确保该表具有主键和聚集索引(这对所有操作都至关重要)。
- 确保聚簇索引在删除一大块行时会发生最少的页面重组。
- 确保您的选择标准是 SARGable。
- 确保您的所有外键约束当前都受信任。
回答by Marc Gravell
(if the indexes are "unused", why are they there at all?)
(如果索引是“未使用的”,它们为什么会在那里?)
One option I've used in the past is to do the work in batches. The crude way would be to use SET ROWCOUNT 20000
(or whatever) and loop (perhaps with a WAITFOR DELAY
) until you get rid of it all (@@ROWCOUNT = 0).
我过去使用的一种选择是分批完成工作。粗略的方法是使用SET ROWCOUNT 20000
(或其他)和循环(也许用 a WAITFOR DELAY
)直到你摆脱它(@@ROWCOUNT = 0)。
This might help reduce the impact upon other systems.
这可能有助于减少对其他系统的影响。
回答by Matt
The problem is you haven't defined your conditions enough. I.e. what exactly are you optimizing?
问题是你没有足够地定义你的条件。即你到底在优化什么?
For example, is the system down for nightly maintenance and no users are on the system? And are you deleting a large % of the database?
例如,系统是否停机进行夜间维护并且系统上没有用户?您是否删除了大量的数据库?
If offline and deleting a large %, may make sense to just build a new table with data to keep, drop the old table, and rename. If deleting a small %, you likely want to batch things in as large batches as your log space allows. It entirely depends on your database, but dropping indexes for the duration of the rebuild may hurt or help -- if even possible due to being "offline".
如果脱机并删除大量 %,则可能只构建一个包含要保留的数据的新表,删除旧表并重命名。如果删除一小部分,您可能希望在日志空间允许的情况下尽可能大批量地进行批处理。这完全取决于您的数据库,但在重建期间删除索引可能会造成伤害或帮助——如果可能的话,因为处于“离线”状态。
If you're online, what's the likelihood your deletes are conflicting with user activity (and is user activity predominantly read, update, or what)? Or, are you trying to optimize for user experience or speed of getting your query done? If you're deleting from a table that's frequently updated by other users, you need to batch but with smaller batch sizes. Even if you do something like a table lock to enforce isolation, that doesn't do much good if your delete statement takes an hour.
如果您在线,您的删除与用户活动(用户活动主要是阅读、更新还是其他活动)发生冲突的可能性有多大?或者,您是否正在尝试优化用户体验或完成查询的速度?如果要从其他用户经常更新的表中删除,则需要进行批处理,但批处理大小较小。即使您执行诸如表锁之类的操作来强制隔离,如果您的删除语句需要一个小时,那也没有多大好处。
When you define your conditions better, you can pick one of the other answers here. I like the link in Rob Sanders' post for batching things.
当您更好地定义条件时,您可以在此处选择其他答案之一。我喜欢 Rob Sanders 帖子中的批处理链接。
回答by HLGEM
If you have lots of foreign key tables, start at the bottom of the chain and work up. The final delete will go faster and block less things if there are no child records to cascade delete (which I would NOT turn on if I had a large number fo child tables as it will kill performance).
如果您有很多外键表,请从链的底部开始并进行处理。如果没有要级联删除的子记录(如果我有大量子表,我不会打开它,因为它会降低性能),最终删除将进行得更快并阻止更少的事情。
Delete in batches.
批量删除。
If you have foreign key tables that are no longer being used (you'd be surprised how often production databses end up with old tables nobody will get rid of), get rid of them or at least break the FK/PK connection. No sense cheking a table for records if it isn't being used.
如果您有不再使用的外键表(您会惊讶于生产数据库最终以没有人会摆脱的旧表结束的频率),摆脱它们或至少断开 FK/PK 连接。如果没有被使用,检查表中的记录是没有意义的。
Don't delete - mark records as delted and then exclude marked records from all queries. This is best set up at the time of database design. A lot of people use this because it is also the best fastest way to get back records accidentlally deleted. But it is a lot of work to set up in an already existing system.
不要删除 - 将记录标记为已删除,然后从所有查询中排除标记的记录。这最好在数据库设计时设置。很多人使用它是因为它也是恢复意外删除记录的最佳最快方式。但在现有系统中进行设置需要大量工作。
回答by Dave Markle
I'll add another one to this:
我将添加另一个:
Make sure your transaction isolation level and database options are set appropriately. If your SQL server is set not to use row versioning, or you're using an isolation level on other queries where you will wait for the rows to be deleted, you could be setting yourself up for some very poor performance while the operation is happening.
确保您的事务隔离级别和数据库选项设置适当。如果您的 SQL Server 设置为不使用行版本控制,或者您在其他查询上使用隔离级别,您将等待行被删除,则您可能会在操作发生时为自己设置一些非常差的性能.
回答by RobS
On very large tables where you have a very specific set of criteria for deletes, you could also partition the table, switch out the partition, and then process the deletions.
在非常大的表上,您有一组非常具体的删除条件,您还可以对表进行分区、切换分区,然后处理删除。
The SQLCAT team has been using this technique on really reallylarge volumes of data. I found some references to it herebut I'll try and find something more definitive.
SQLCAT 团队一直在对非常大量的数据使用这种技术。我在这里找到了一些参考资料,但我会尝试找到更明确的内容。