SQL 更新 1.2 亿条记录的最快方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3711217/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fastest way to update 120 Million records
提问by Bob Probst
I need to initialize a new field with the value -1 in a 120 Million record table.
我需要在 1.2 亿条记录表中初始化一个值为 -1 的新字段。
Update table
set int_field = -1;
I let it run for 5 hours before canceling it.
在取消之前,我让它运行了 5 个小时。
I tried running it with transaction level set to read uncommitted with the same results.
我尝试在事务级别设置为读取未提交的情况下运行它,结果相同。
Recovery Model = Simple.
MS SQL Server 2005
Any advice on getting this done faster?
有什么建议可以更快地完成这项工作吗?
采纳答案by Peter Radocchia
The only sane way to update a table of 120M records is with a SELECT
statement that populates a secondtable. You have to take care when doing this. Instructions below.
更新包含 120M 记录的表的唯一明智方法是使用SELECT
填充第二个表的语句。执行此操作时必须小心。下面的说明。
Simple Case
简单案例
For a table w/out a clustered index, during a time w/out concurrent DML:
对于没有聚集索引的表,在没有并发 DML 的时间内:
SELECT *, new_col = 1 INTO clone.BaseTable FROM dbo.BaseTable
- recreate indexes, constraints, etc on new table
- switch old and new w/ ALTER SCHEMA ... TRANSFER.
- drop old table
SELECT *, new_col = 1 INTO clone.BaseTable FROM dbo.BaseTable
- 在新表上重新创建索引、约束等
- 使用 ALTER SCHEMA ... TRANSFER 切换旧的和新的。
- 放下旧桌子
If you can't create a clone schema, a different table name in the same schema will do. Remember to rename all your constraints and triggers (if applicable) after the switch.
如果您无法创建克隆架构,则可以使用同一架构中的不同表名。请记住在切换后重命名所有约束和触发器(如果适用)。
Non-simple Case
非简单案例
First, recreate your BaseTable
with the same name under a different schema, eg clone.BaseTable
. Using a separate schema will simplify the rename process later.
首先,BaseTable
在不同的模式下使用相同的名称重新创建你的,例如clone.BaseTable
. 使用单独的模式将简化稍后的重命名过程。
- Include the clustered index, if applicable. Remember that primary keys and unique constraints may be clustered, but not necessarily so.
- Include identity columns and computed columns, if applicable.
- Include your new INT column, wherever it belongs.
- Do not includeany of the following:
- triggers
- foreign key constraints
- non-clustered indexes/primary keys/unique constraints
- check constraints or default constraints. Defaults don't make much of difference, but we're trying to keep things minimal.
- 包括聚集索引(如果适用)。请记住,主键和唯一约束可能会聚集在一起,但不一定如此。
- 如果适用,包括标识列和计算列。
- 包括您的新 INT 列,无论它属于哪里。
- 不包括以下任何一项:
- 触发器
- 外键约束
- 非聚集索引/主键/唯一约束
- 检查约束或默认约束。默认值没有太大区别,但我们正在努力使事情最小化。
Then, test your insert w/ 1000 rows:
然后,用 1000 行测试您的插入:
-- assuming an IDENTITY column in BaseTable
SET IDENTITY_INSERT clone.BaseTable ON
GO
INSERT clone.BaseTable WITH (TABLOCK) (Col1, Col2, Col3)
SELECT TOP 1000 Col1, Col2, Col3 = -1
FROM dbo.BaseTable
GO
SET IDENTITY_INSERT clone.BaseTable OFF
Examine the results. If everything appears in order:
检查结果。如果一切都按顺序出现:
- truncate the clone table
- make sure the database in in bulk-logged or simple recovery model
- perform the full insert.
- 截断克隆表
- 确保数据库处于大容量日志或简单恢复模式
- 执行完整插入。
This will take a while, but not nearly as long as an update. Once it completes, check the data in the clone table to make sure it everything is correct.
这将需要一段时间,但不会像更新一样长。完成后,检查克隆表中的数据以确保一切正确。
Then, recreate all non-clustered primary keys/unique constraints/indexes and foreign key constraints (in that order). Recreate default and check constraints, if applicable. Recreate all triggers. Recreate each constraint, index or trigger in a separate batch. eg:
然后,重新创建所有非集群主键/唯一约束/索引和外键约束(按该顺序)。如果适用,重新创建默认值并检查约束。重新创建所有触发器。在单独的批处理中重新创建每个约束、索引或触发器。例如:
ALTER TABLE clone.BaseTable ADD CONSTRAINT UQ_BaseTable UNIQUE (Col2)
GO
-- next constraint/index/trigger definition here
Finally, move dbo.BaseTable
to a backup schema and clone.BaseTable
to the dbo schema (or wherever your table is supposed to live).
最后,移动dbo.BaseTable
到备份模式和clone.BaseTable
dbo 模式(或你的表应该存在的任何地方)。
-- -- perform first true-up operation here, if necessary
-- EXEC clone.BaseTable_TrueUp
-- GO
-- -- create a backup schema, if necessary
-- CREATE SCHEMA backup_20100914
-- GO
BEGIN TRY
BEGIN TRANSACTION
ALTER SCHEMA backup_20100914 TRANSFER dbo.BaseTable
-- -- perform second true-up operation here, if necessary
-- EXEC clone.BaseTable_TrueUp
ALTER SCHEMA dbo TRANSFER clone.BaseTable
COMMIT TRANSACTION
END TRY
BEGIN CATCH
SELECT ERROR_MESSAGE() -- add more info here if necessary
ROLLBACK TRANSACTION
END CATCH
GO
If you need to free-up disk space, you may drop your original table at this time, though it may be prudent to keep it around a while longer.
如果您需要释放磁盘空间,此时您可以删除原始表,但将其保留一段时间可能是明智之举。
Needless to say, this is ideally an offlineoperation. If you have people modifying data while you perform this operation, you will have to perform a true-up operation with the schema switch. I recommend creating a trigger on dbo.BaseTable
to log all DML to a separate table. Enable this trigger before you start the insert. Then in the same transaction that you perform the schema transfer, use the log table to perform a true-up. Test this first on a subset of the data! Deltas are easy to screw up.
不用说,这是理想的离线操作。如果有人在执行此操作时修改数据,则必须使用架构开关执行校准操作。我建议创建一个触发器来dbo.BaseTable
将所有 DML 记录到一个单独的表中。在开始插入之前启用此触发器。然后在您执行模式传输的同一事务中,使用日志表执行校准。首先在数据的子集上测试!Delta 很容易搞砸。
回答by Mike Forman
If you have the disk space, you could use SELECT INTO and create a new table. It's minimally logged, so it would go much faster
如果您有磁盘空间,您可以使用 SELECT INTO 并创建一个新表。它的日志记录最少,所以速度会快得多
select t.*, int_field = CAST(-1 as int)
into mytable_new
from mytable t
-- create your indexes and constraints
GO
exec sp_rename mytable, mytable_old
exec sp_rename mytable_new, mytable
drop table mytable_old
回答by K.C.
I break the task up into smaller units. Test with different batch size intervals for your table, until you find an interval that performs optimally. Here is a sample that I have used in the past.
我将任务分解为更小的单元。为您的表使用不同的批量大小间隔进行测试,直到找到性能最佳的间隔。这是我过去使用过的示例。
declare @counter int
declare @numOfRecords int
declare @batchsize int
set @numOfRecords = (SELECT COUNT(*) AS NumberOfRecords FROM <TABLE> with(nolock))
set @counter = 0
set @batchsize = 2500
set rowcount @batchsize
while @counter < (@numOfRecords/@batchsize) +1
begin
set @counter = @counter + 1
Update table set int_field = -1 where int_field <> -1;
end
set rowcount 0
回答by Pablo Santa Cruz
If your int_field is indexed, remove the index before running the update. Then create your index again...
如果您的 int_field 已编入索引,请在运行更新之前删除该索引。然后再次创建您的索引...
5 hours seem like a lot for 120 million recs.
5 小时对于 1.2 亿记录来说似乎很多。
回答by BlackTigerX
set rowcount 1000000
Update table set int_field = -1 where int_field<>-1
see how fast that takes, adjust and repeat as necessary
看看需要多快,根据需要调整和重复
回答by aads
declare @cnt bigint
set @cnt = 1
while @cnt*100<10000000
begin
UPDATE top(100) [Imp].[dbo].[tablename]
SET [col1] = xxxx
WHERE[col1] is null
print '@cnt: '+convert(varchar,@cnt)
set @cnt=@cnt+1
end
回答by Philip Kelley
When addinga new column ("initialize a new field") and setting a single value to each existing row, I use the following tactic:
当添加新的列(“初始化一个新的场”)和一个单一的值设定为每个现有行,我使用下列策略:
ALTER TABLE MyTable
add NewColumn int not null
constraint MyTable_TemporaryDefault
default -1
ALTER TABLE MyTable
drop constraint MyTable_TemporaryDefault
If the column is nullable and you don't include a "declared" constraint, the column will be set to null for all rows.
如果该列可以为空并且您不包含“声明”约束,则该列的所有行都将设置为空。
回答by dance2die
What I'd try first is
to drop all constraints, indexes, triggers and full text indexes first before you update.
我首先要尝试的是
在更新之前先删除所有约束、索引、触发器和全文索引。
If above wasn't performant enough, my next move would be
to create a CSV file with 12 million records and bulk import it using bcp.
如果以上性能不够,我的下一步将是
创建一个包含 1200 万条记录的 CSV 文件并使用 bcp 批量导入它。
Lastly, I'd create a new heap table (meaning table with no primary key) with no indexes on a different filegroup, populate it with -1. Partition the old table, and add the new partition using "switch".
最后,我将在不同的文件组上创建一个没有索引的新堆表(意味着没有主键的表),并用 -1 填充它。对旧表进行分区,并使用“开关”添加新分区。
回答by Brad
Sounds like an indexing problem, like Pabla Santa Cruz mentioned. Since your update is not conditional, you can DROP the column and RE-ADD it with a DEFAULT value.
听起来像一个索引问题,就像 Pabla Santa Cruz 提到的那样。由于您的更新不是有条件的,您可以删除该列并使用 DEFAULT 值重新添加它。
回答by alexber
In general, recommendation are next:
一般来说,建议如下:
- Remove or just Disable all INDEXES, TRIGGERS, CONSTRAINTS on the table;
- Perform COMMIT more often (e.g. after each 1000 records that were updated);
- Use select ... into.
- 删除或仅禁用表上的所有 INDEXES、TRIGGERS、CONSTRAINTS;
- 更频繁地执行 COMMIT(例如,每更新 1000 条记录后);
- 使用选择...进入。
But in particular case you should choose the most appropriate solution or their combination.
但在特殊情况下,您应该选择最合适的解决方案或它们的组合。
Also bear in mind that sometime index could be useful e.g. when you perform update of non-indexed column by some condition.
还要记住,有时索引可能很有用,例如,当您按某些条件执行非索引列的更新时。