oracle SQL Server 中的批量更新和提交频率
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/531222/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Mass Updates and Commit frequency in SQL Server
提问by RussellH
My database background is mainly Oracle, but I've recently been helping with some SQL Server work. My group has inherited some SQL server DTS packages that do daily loads and updates of large amounts of data. Currently it is running in SQL Server 2000, but will soon be upgraded to SQL Server 2005 or 2008. The mass updates are running too slowly.
我的数据库背景主要是 Oracle,但我最近在帮助一些 SQL Server 工作。我的小组继承了一些 SQL 服务器 DTS 包,这些包每天执行大量数据的加载和更新。目前运行在 SQL Server 2000 中,但很快将升级到 SQL Server 2005 或 2008。批量更新运行速度太慢。
One thing I noticed about the code is that some large updates are done in procedural code in loops, so that each statement only updates a small portion of the table in a single transaction. Is this a sound method to do updates in SQL server? Locking of concurrent sessions should not be an issue because user access to tables is disabled while the bulk loading takes place. I've googled around some, and found some articles suggesting that doing it this way conserves resources, and that resources are released each time an update commits, leading to greater efficiency. In Oracle this is generally a bad approach, and I've used single transactions for very large updates with success in Oracle. Frequent commits slow the process down and use more resources in Oracle.
我注意到关于代码的一件事是,一些大的更新是在循环中的过程代码中完成的,因此每个语句仅在单个事务中更新表的一小部分。这是在 SQL Server 中进行更新的合理方法吗?并发会话的锁定应该不是问题,因为在进行批量加载时用户对表的访问被禁用。我搜索了一些,发现一些文章表明这样做可以节省资源,并且每次更新提交时都会释放资源,从而提高效率。在 Oracle 中,这通常是一种糟糕的方法,我已经在 Oracle 中成功使用单个事务进行非常大的更新。频繁提交会减慢进程并在 Oracle 中使用更多资源。
My question is, for mass updates in SQL Server, is it generally a good practice to use procedural code, and commit many SQL statements, or to use one big statement to do the whole update?
我的问题是,对于 SQL Server 中的大量更新,使用过程代码并提交许多 SQL 语句,还是使用一个大语句来完成整个更新通常是一种好习惯?
采纳答案by RussellH
Sorry Guys,
对不起大家,
None of the above answer the question. They are just examples of how you can do things. The answer is, more resources get used with frequent commits, however, the transaction log cannot be truncated until a commit point. Thus, if your single spanning transaction is very big it will cause the transaction log to grow and possibly fregment which if undetected will cause problems later. Also, in a rollback situation, the duration is generally twice as long as the original transaction. So if your transaction fails after 1/2 hour it will take 1 hour to roll back and you can't stop it :-)
以上都没有回答这个问题。它们只是您如何做事的示例。答案是,频繁提交会使用更多资源,但是,在提交点之前无法截断事务日志。因此,如果您的单个跨越事务非常大,它会导致事务日志增长,并且可能会导致碎片化,如果未被检测到会导致以后出现问题。此外,在回滚情况下,持续时间通常是原始事务的两倍。因此,如果您的交易在 1/2 小时后失败,则需要 1 小时才能回滚并且您无法阻止它:-)
I have worked with SQL Server2000/2005, DB2, ADABAS and the above is true for all. I don't really see how Oracle can work differently.
我曾使用过 SQL Server2000/2005、DB2、ADABAS,上述情况适用于所有人。我真的不明白 Oracle 如何以不同的方式工作。
You could possibly replace the T-SQL with a bcp command and there you can set the batch size without having to code it.
您可以用 bcp 命令替换 T-SQL,然后您就可以设置批处理大小而无需对其进行编码。
Issuing frequest commits in a single table scan is prefferable to running multiple scans with small processing numbers because generally if a table scan is required the whole table will be scanned even if you only returning a small subset.
在单个表扫描中发出 frequest 提交比运行多个处理次数少的扫描更可取,因为通常如果需要表扫描,即使您只返回一个小的子集,也会扫描整个表。
Stay away from snapshots. A snapshot will only increase the number of IOs and competes for IO and CPU
远离快照。一个snapshot只会增加IO数量,争抢IO和CPU
回答by beach
In general, I find it better to update in batches - typically in the range of between 100 to 1000. It all depends on how your tables are structured: foreign keys? Triggers? Or just updating raw data? You need to experiment to see which scenario works best for you.
一般来说,我发现分批更新更好 - 通常在 100 到 1000 之间。这完全取决于您的表的结构:外键?触发器?或者只是更新原始数据?您需要进行试验,看看哪种方案最适合您。
If I am in pure SQL, I will do something like this to help manage server resources:
如果我使用纯 SQL,我会做这样的事情来帮助管理服务器资源:
SET ROWCOUNT 1000
WHILE 1=1 BEGIN
DELETE FROM MyTable WHERE ...
IF @@ROWCOUNT = 0
BREAK
END
SET ROWCOUNT 0
In this example, I am purging data. This would only work for an UPDATE if you could restrict or otherwise selectively update rows. (Or only insert xxxx number of rows into an auxiliary table that you can JOIN against.)
在这个例子中,我正在清除数据。如果您可以限制或以其他方式有选择地更新行,这仅适用于 UPDATE。(或者只将 xxxx 行数插入到您可以 JOIN 的辅助表中。)
But yes, try not to update xx million rows at one time. It takes forever and if an error occurs, all those rows will be rolled back (which takes an additional forever.)
但是是的,尽量不要一次更新 xx 万行。它需要永远,如果发生错误,所有这些行都将被回滚(这需要永远额外的时间。)
回答by Sam Saffron
Well everything depends.
好吧,一切都取决于。
But ... assumingyour db is in single user modeor you have table locks(tablockx) against all the tables involved, batches will probably perform worse. Especially if the batches are forcing table scans.
但是......假设您的数据库处于单用户模式,或者您对所有涉及的表都有表锁(tablockx),批处理的性能可能会更差。特别是如果批次强制进行表扫描。
The one caveat is that very complex queries will quite often consume resources on tempdb, if tempdb runs out of space (cause the execution plan required a nasty complicated hash join) you are in deep trouble.
一个警告是非常复杂的查询会经常消耗 tempdb 上的资源,如果 tempdb 空间不足(因为执行计划需要一个令人讨厌的复杂散列连接),你就会陷入困境。
Working in batches is a general practice that is quite often used in SQL Server (when its not in snapshot isolation mode) to increase concurrency and avoid huge transaction rollbacks because of deadlocks (you tend to get deadlock galore when updating a 10 million row table that is active).
批量工作是一种普遍做法,在 SQL Server 中经常使用(当它不处于快照隔离模式时)以增加并发性并避免由于死锁而导致的大量事务回滚(在更新 1000 万行表时,您往往会遇到大量死锁活跃)。
回答by HLGEM
When you move to SQL Server 2005 or 2008, you will need to redo all those DTS packages in SSIS. I think you will pleasantly surprised to see how much faster SSIS can be.
当您迁移到 SQL Server 2005 或 2008 时,您将需要在 SSIS 中重做所有这些 DTS 包。我想你会惊喜地看到 SSIS 的速度有多快。
In general, In SQL Server 2000, you want to run things in batches of records if the whole set ties up the table for too long. If you are running the packages at night when there is no use on the system, you may be able to get away with a set-based insert of the entire dataset. Row-by-row is always the slowest method, so avoid that if possible as well (Especially if all the row-row-row inserts are in one giant transaction!). If you have 24 hour access with no down time, you will almost certainly need to run in batches.
通常,在 SQL Server 2000 中,如果整个集合占用表太长时间,您希望分批运行记录。如果您在晚上在系统上没有使用的情况下运行包,您可能能够通过基于集合的整个数据集插入而逃脱。逐行总是最慢的方法,所以如果可能的话也避免这种方法(特别是如果所有的行行行插入都在一个巨大的事务中!)。如果您可以 24 小时无停机访问,则几乎肯定需要批量运行。