在本机 SQL 中批量提交大型 INSERT 操作？

Question

提问by Cade Roux

I have a couple large tables (188m and 144m rows) I need to populate from views, but each view contains a few hundred million rows (pulling together pseudo-dimensionally modelled data into a flat form). The keys on each table are over 50 composite bytes of columns. If the data was in tables, I could always think about using sp_rename to make the other new table, but that isn't really an option.

我有几个大表（188m 和 144m 行）我需要从视图中填充，但每个视图都包含几亿行（将伪维度建模数据合并为平面形式）。每个表上的键都超过 50 个组合字节的列。如果数据在表中，我总是可以考虑使用 sp_rename 来创建另一个新表，但这并不是一个真正的选择。

If I do a single INSERT operation, the process uses a huge amount of transaction log space, typicalyl filing it up and prompting a bunch of hassle with the DBAs. (And yes, this is probably a job the DBAs should handle/design/architect)

如果我执行单个 INSERT 操作，该过程将使用大量事务日志空间，通常会将其归档并提示 DBA 的一堆麻烦。（是的，这可能是 DBA 应该处理/设计/架构师的工作）

I can use SSIS and stream the data into the destination table with batch commits (but this does require the data to be transmitted over the network, since we are not allowed to run SSIS packages on the server).

我可以使用 SSIS 并通过批量提交将数据流式传输到目标表中（但这确实需要通过网络传输数据，因为我们不允许在服务器上运行 SSIS 包）。

Any things other than to divide the process up into multiple INSERT operations using some kind of key to distribute the rows into different batches and doing a loop?

除了使用某种键将行划分为多个 INSERT 操作将行分配到不同的批次并执行循环之外，还有其他任何事情吗？

Answer 1

采纳答案by Arthur

You could partition your data and insert your data in a cursor loop. That would be nearly the same as SSIS batchinserting. But runs on your server.

您可以对数据进行分区并将数据插入到游标循环中。这几乎与 SSIS 批量插入相同。但是在您的服务器上运行。

create cursor ....
select YEAR(DateCol), MONTH(DateCol) from whatever

while ....
    insert into yourtable(...)
    select * from whatever 
    where YEAR(DateCol) = year and MONTH(DateCol) = month
end

Answer 2

回答by Aaron Bertrand

Does the view have ANY kind of unique identifier / candidate key? If so, you could select those rows into a working table using:

视图是否有任何类型的唯一标识符/候选键？如果是这样，您可以使用以下方法将这些行选择到工作表中：

SELECT key_columns INTO dbo.temp FROM dbo.HugeView;

(If it makes sense, maybe put this table into a different database, perhaps with SIMPLE recovery model, to prevent the log activity from interfering with your primary database. This should generate much less log anyway, and you can free up the space in the other database before you resume, in case the problem is that you have inadequate disk space all around.)

（如果有意义，可以将此表放入不同的数据库中，也许使用 SIMPLE 恢复模型，以防止日志活动干扰您的主数据库。无论如何，这应该生成更少的日志，并且您可以释放数据库中的空间恢复之前的其他数据库，以防问题是您周围的磁盘空间不足。）

Then you can do something like this, inserting 10,000 rows at a time, and backing up the log in between:

然后你可以做这样的事情，一次插入 10,000 行，并在两者之间备份日志：

SET NOCOUNT ON;

DECLARE
    @batchsize INT,
    @ctr INT,
    @rc INT;

SELECT
    @batchsize = 10000,
    @ctr = 0;

WHILE 1 = 1
BEGIN
    WITH x AS
    (
        SELECT key_column, rn = ROW_NUMBER() OVER (ORDER BY key_column)
        FROM dbo.temp
    )
    INSERT dbo.PrimaryTable(a, b, c, etc.)
        SELECT v.a, v.b, v.c, etc.
        FROM x
        INNER JOIN dbo.HugeView AS v
        ON v.key_column = x.key_column
        WHERE x.rn > @batchsize * @ctr
        AND x.rn <= @batchsize * (@ctr + 1);

    IF @@ROWCOUNT = 0
        BREAK;

    BACKUP LOG PrimaryDB TO DISK = 'C:\db.bak' WITH INIT;

    SET @ctr = @ctr + 1;
END

That's all off the top of my head, so don't cut/paste/run, but I think the general idea is there. For more details (and why I backup log / checkpoint inside the loop), see this post on sqlperformance.com:

这完全超出了我的脑海，所以不要剪切/粘贴/运行，但我认为总体思路就在那里。有关更多详细信息（以及为什么我在循环内备份日志/检查点），请参阅sqlperformance.com上的这篇文章：

Break large delete operations into chunks

将大型删除操作分解成块

Note that if you are taking regular database and log backups you will probably want to take a full to start your log chain over again.

请注意，如果您进行常规数据库和日志备份，您可能需要完整备份以重新启动您的日志链。

Answer 3

回答by QuickDraw

I know this is an old thread, but I made a generic version of Arthur's cursor solution:

我知道这是一个旧线程，但我制作了 Arthur 光标解决方案的通用版本：

--Split a batch up into chunks using a cursor.
--This method can be used for most any large table with some modifications
--It could also be refined further with an @Day variable (for example)

DECLARE @Year INT
DECLARE @Month INT

DECLARE BatchingCursor CURSOR FOR
SELECT DISTINCT YEAR(<SomeDateField>),MONTH(<SomeDateField>)
FROM <Sometable>;


OPEN BatchingCursor;
FETCH NEXT FROM BatchingCursor INTO @Year, @Month;
WHILE @@FETCH_STATUS = 0
BEGIN

--All logic goes in here
--Any select statements from <Sometable> need to be suffixed with:
--WHERE Year(<SomeDateField>)=@Year AND Month(<SomeDateField>)=@Month   


  FETCH NEXT FROM BatchingCursor INTO @Year, @Month;
END;
CLOSE BatchingCursor;
DEALLOCATE BatchingCursor;
GO

This solved the problem on loads of our large tables.

这解决了我们大表负载的问题。

Answer 4

回答by Remus Rusanu

There is no pixie dust, you know that.

没有小精灵灰尘，你知道的。

Without knowing specifics about the actual schema being transfered, a generic solution would be exactly as you describe it: divide processing into multiple inserts and keep track of the key(s). This is sort of pseudo-code T-SQL:

在不知道正在传输的实际模式的细节的情况下，通用解决方案将与您描述的完全一样：将处理划分为多个插入并跟踪键。这是一种伪代码 T-SQL：

create table currentKeys (table sysname not null primary key, key sql_variant not null);
go

declare @keysInserted table (key sql_variant);
declare @key sql_variant;
begin transaction
do while (1=1)
begin
    select @key = key from currentKeys where table = '<target>';
    insert into <target> (...)
    output inserted.key into @keysInserted (key)
    select top (<batchsize>) ... from <source>
    where key > @key
    order by key;

    if (0 = @@rowcount)
       break; 

    update currentKeys 
    set key = (select max(key) from @keysInserted)
    where table = '<target>';
    commit;
    delete from @keysInserted;
    set @key = null;
    begin transaction;
end
commit

It would get more complicated if you want to allow for parallel batches and partition the keys.

如果您想允许并行批处理并分区键，它会变得更加复杂。

Answer 5

回答by Raj More

You could use the BCP command to load the data and use the Batch Size parameter

您可以使用 BCP 命令加载数据并使用 Batch Size 参数

http://msdn.microsoft.com/en-us/library/ms162802.aspx

Two step process

两步过程

BCP OUT data from Views into Text files
BCP IN data from Text files into Tables with batch size parameter

从视图到文本文件的 BCP OUT 数据
BCP IN 数据从文本文件到带有批量大小参数的表格

Answer 6

回答by Chris McCall

This looks like a job for good ol' BCP.

这看起来像一个好 ol' BCP 的工作。

在本机 SQL 中批量提交大型 INSERT 操作？

提问by Cade Roux

采纳答案by Arthur

回答by Aaron Bertrand

回答by QuickDraw

回答by Remus Rusanu

回答by Raj More

回答by Chris McCall

相关推荐

最近更新

标签

在本机 SQL 中批量提交大型 INSERT 操作？

提问by Cade Roux

采纳答案by Arthur

回答by Aaron Bertrand

回答by QuickDraw

回答by Remus Rusanu

回答by Raj More

回答by Chris McCall

相关推荐

SQL 如何在sql查询-oracle 10g中用双引号替换单引号？

在 sql server 2008 中将 varchar 转换为十进制

从 oracle-sql 中的特定列中删除整个数据

SQL 根据另一个表删除一个表中的所有行

相关推荐

最近更新

标签