在本机 SQL 中批量提交大型 INSERT 操作?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1602244/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Batch commit on large INSERT operation in native SQL?
提问by Cade Roux
I have a couple large tables (188m and 144m rows) I need to populate from views, but each view contains a few hundred million rows (pulling together pseudo-dimensionally modelled data into a flat form). The keys on each table are over 50 composite bytes of columns. If the data was in tables, I could always think about using sp_rename to make the other new table, but that isn't really an option.
我有几个大表(188m 和 144m 行)我需要从视图中填充,但每个视图都包含几亿行(将伪维度建模数据合并为平面形式)。每个表上的键都超过 50 个组合字节的列。如果数据在表中,我总是可以考虑使用 sp_rename 来创建另一个新表,但这并不是一个真正的选择。
If I do a single INSERT operation, the process uses a huge amount of transaction log space, typicalyl filing it up and prompting a bunch of hassle with the DBAs. (And yes, this is probably a job the DBAs should handle/design/architect)
如果我执行单个 INSERT 操作,该过程将使用大量事务日志空间,通常会将其归档并提示 DBA 的一堆麻烦。(是的,这可能是 DBA 应该处理/设计/架构师的工作)
I can use SSIS and stream the data into the destination table with batch commits (but this does require the data to be transmitted over the network, since we are not allowed to run SSIS packages on the server).
我可以使用 SSIS 并通过批量提交将数据流式传输到目标表中(但这确实需要通过网络传输数据,因为我们不允许在服务器上运行 SSIS 包)。
Any things other than to divide the process up into multiple INSERT operations using some kind of key to distribute the rows into different batches and doing a loop?
除了使用某种键将行划分为多个 INSERT 操作将行分配到不同的批次并执行循环之外,还有其他任何事情吗?
采纳答案by Arthur
You could partition your data and insert your data in a cursor loop. That would be nearly the same as SSIS batchinserting. But runs on your server.
您可以对数据进行分区并将数据插入到游标循环中。这几乎与 SSIS 批量插入相同。但是在您的服务器上运行。
create cursor ....
select YEAR(DateCol), MONTH(DateCol) from whatever
while ....
insert into yourtable(...)
select * from whatever
where YEAR(DateCol) = year and MONTH(DateCol) = month
end
回答by Aaron Bertrand
Does the view have ANY kind of unique identifier / candidate key? If so, you could select those rows into a working table using:
视图是否有任何类型的唯一标识符/候选键?如果是这样,您可以使用以下方法将这些行选择到工作表中:
SELECT key_columns INTO dbo.temp FROM dbo.HugeView;
(If it makes sense, maybe put this table into a different database, perhaps with SIMPLE recovery model, to prevent the log activity from interfering with your primary database. This should generate much less log anyway, and you can free up the space in the other database before you resume, in case the problem is that you have inadequate disk space all around.)
(如果有意义,可以将此表放入不同的数据库中,也许使用 SIMPLE 恢复模型,以防止日志活动干扰您的主数据库。无论如何,这应该生成更少的日志,并且您可以释放数据库中的空间恢复之前的其他数据库,以防问题是您周围的磁盘空间不足。)
Then you can do something like this, inserting 10,000 rows at a time, and backing up the log in between:
然后你可以做这样的事情,一次插入 10,000 行,并在两者之间备份日志:
SET NOCOUNT ON;
DECLARE
@batchsize INT,
@ctr INT,
@rc INT;
SELECT
@batchsize = 10000,
@ctr = 0;
WHILE 1 = 1
BEGIN
WITH x AS
(
SELECT key_column, rn = ROW_NUMBER() OVER (ORDER BY key_column)
FROM dbo.temp
)
INSERT dbo.PrimaryTable(a, b, c, etc.)
SELECT v.a, v.b, v.c, etc.
FROM x
INNER JOIN dbo.HugeView AS v
ON v.key_column = x.key_column
WHERE x.rn > @batchsize * @ctr
AND x.rn <= @batchsize * (@ctr + 1);
IF @@ROWCOUNT = 0
BREAK;
BACKUP LOG PrimaryDB TO DISK = 'C:\db.bak' WITH INIT;
SET @ctr = @ctr + 1;
END
That's all off the top of my head, so don't cut/paste/run, but I think the general idea is there. For more details (and why I backup log / checkpoint inside the loop), see this post on sqlperformance.com:
这完全超出了我的脑海,所以不要剪切/粘贴/运行,但我认为总体思路就在那里。有关更多详细信息(以及为什么我在循环内备份日志/检查点),请参阅sqlperformance.com上的这篇文章:
Note that if you are taking regular database and log backups you will probably want to take a full to start your log chain over again.
请注意,如果您进行常规数据库和日志备份,您可能需要完整备份以重新启动您的日志链。
回答by QuickDraw
I know this is an old thread, but I made a generic version of Arthur's cursor solution:
我知道这是一个旧线程,但我制作了 Arthur 光标解决方案的通用版本:
--Split a batch up into chunks using a cursor.
--This method can be used for most any large table with some modifications
--It could also be refined further with an @Day variable (for example)
DECLARE @Year INT
DECLARE @Month INT
DECLARE BatchingCursor CURSOR FOR
SELECT DISTINCT YEAR(<SomeDateField>),MONTH(<SomeDateField>)
FROM <Sometable>;
OPEN BatchingCursor;
FETCH NEXT FROM BatchingCursor INTO @Year, @Month;
WHILE @@FETCH_STATUS = 0
BEGIN
--All logic goes in here
--Any select statements from <Sometable> need to be suffixed with:
--WHERE Year(<SomeDateField>)=@Year AND Month(<SomeDateField>)=@Month
FETCH NEXT FROM BatchingCursor INTO @Year, @Month;
END;
CLOSE BatchingCursor;
DEALLOCATE BatchingCursor;
GO
This solved the problem on loads of our large tables.
这解决了我们大表负载的问题。
回答by Remus Rusanu
There is no pixie dust, you know that.
没有小精灵灰尘,你知道的。
Without knowing specifics about the actual schema being transfered, a generic solution would be exactly as you describe it: divide processing into multiple inserts and keep track of the key(s). This is sort of pseudo-code T-SQL:
在不知道正在传输的实际模式的细节的情况下,通用解决方案将与您描述的完全一样:将处理划分为多个插入并跟踪键。这是一种伪代码 T-SQL:
create table currentKeys (table sysname not null primary key, key sql_variant not null);
go
declare @keysInserted table (key sql_variant);
declare @key sql_variant;
begin transaction
do while (1=1)
begin
select @key = key from currentKeys where table = '<target>';
insert into <target> (...)
output inserted.key into @keysInserted (key)
select top (<batchsize>) ... from <source>
where key > @key
order by key;
if (0 = @@rowcount)
break;
update currentKeys
set key = (select max(key) from @keysInserted)
where table = '<target>';
commit;
delete from @keysInserted;
set @key = null;
begin transaction;
end
commit
It would get more complicated if you want to allow for parallel batches and partition the keys.
如果您想允许并行批处理并分区键,它会变得更加复杂。
回答by Raj More
You could use the BCP command to load the data and use the Batch Size parameter
您可以使用 BCP 命令加载数据并使用 Batch Size 参数
http://msdn.microsoft.com/en-us/library/ms162802.aspx
http://msdn.microsoft.com/en-us/library/ms162802.aspx
Two step process
两步过程
- BCP OUT data from Views into Text files
- BCP IN data from Text files into Tables with batch size parameter
- 从视图到文本文件的 BCP OUT 数据
- BCP IN 数据从文本文件到带有批量大小参数的表格