C# 改进 INSERT INTO - FROM SELECT、SQL 查询
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/541669/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Improve INSERT INTO - FROM SELECT, SQL Query
提问by Melursus
Currently I got this type of query generated by programmation (c#)
目前我得到了由编程(c#)生成的这种类型的查询
INSERT INTO TableName (Field1, Field2, Field3)
SELECT Field1, Field2, Field3 FROM TableName2
The problem is that the SELECT can have a result of many records (like a million), so it's take many times and the result is an connection timeout.
问题是 SELECT 可以有很多记录的结果(比如一百万),所以它需要很多次,结果是连接超时。
Also, if I separate all the insert into in single insert (for this example, one million insert queries), it's take very long time to execute ... but it work ...
此外,如果我将所有插入分离到单个插入中(例如,一百万个插入查询),执行需要很长时间......但它工作......
Is there a way I can improve this type of query ?
有没有办法改进这种类型的查询?
I use MSSQl 2005
我使用 MSSQl 2005
Thx
谢谢
采纳答案by Frederik Gheysels
I've found out that, if you have a lot of INSERT statements that are executed in sequence, you can improve performance by adding a 'GO' statement after each xxxx number of insert statements:
我发现,如果你有很多顺序执行的 INSERT 语句,你可以通过在每 xxxx 个插入语句后添加一个 'GO' 语句来提高性能:
...
INSERT INTO Table ( ... ) VALUES ( ... )
INSERT INTO Table ( ... ) VALUES ( ... )
INSERT INTO Table ( ... ) VALUES ( ... )
GO
INSERT INTO Table ( ... ) VALUES ( ... )
INSERT INTO Table ( ... ) VALUES ( ... )
...
Another possibility maybe, is to make sure that your INSERT INTO .. SELECT FROM query doesn't insert everything in one time, instead use some kind of paging technique:
另一种可能性可能是确保您的 INSERT INTO .. SELECT FROM 查询不会一次性插入所有内容,而是使用某种分页技术:
INSERT INTO Table ...
SELECT ...
FROM OtherTable WHERE Id > x and Id < y
回答by Marc Gravell
Well, if it is a full copy, I wonder if you shouldn't look into bulk load tools?
好吧,如果是完整副本,我想知道您是否应该研究批量加载工具?
- BULK INSERT (TSQL)
- SqlBulkCopy (.NET)
- bcp (command line)
- etc
- 批量插入 (TSQL)
- SqlBulkCopy (.NET)
- bcp(命令行)
- 等等
If you hada Where
clause, I'd check that it was suitably indexed...
如果你有一个Where
条款,我会检查它是否被适当地索引......
Additionally:
此外:
- perhaps drop indexes and triggers before doing the INSERT (recreate afterwards)
consider dropping the entire table and using SELECT INTO ?(see comments)
- 也许在执行 INSERT 之前删除索引和触发器(之后重新创建)
考虑删除整个表并使用 SELECT INTO ?(看评论)
回答by Anton Gogolev
Set CommandTimeout
property of the SqlCommand
you're using to a sensible value (10 minutes or something). Remember that CommandTimeout
is in seconds.
将您正在使用的CommandTimeout
属性设置SqlCommand
为一个合理的值(10 分钟之类的)。请记住,这CommandTimeout
是以秒为单位的。
回答by SAMills
You don't indicate what problem your solving with this approach. Obviously a WHERE would narrow the record set. But if the result set isn't going to be modified in the New Table, then why replicate the data at all? Why not query directly from the Source?
你没有说明你用这种方法解决了什么问题。显然 WHERE 会缩小记录集。但是如果结果集不会在新表中被修改,那么为什么要复制数据呢?为什么不直接从 Source 查询?
回答by SQLMenace
either bulk load by using a file and then bcp/BULK INSERT or batch it up in batches of 5K or so
要么使用文件批量加载,然后使用 bcp/BULK INSERT 要么批量加载 5K 左右
回答by HLGEM
In the first place never try to insert a million records through C#. Never process large groups of records one at a time. This is work that should be done at the database by the database. USe bulk insert or SSIS or DTS to do this. And then schedule it as a job during off hours. If it still takes too long, then I suggest you run it in batches of several thousand (you will have to play with your own database to see what the best choice is as the number you can safely process depends greatly onthe tables, the indexing how fast your server is and how many users are also trying to do work against the same tables.
首先,永远不要尝试通过 C# 插入一百万条记录。切勿一次处理大量记录。这是应该由数据库在数据库中完成的工作。使用批量插入或 SSIS 或 DTS 来执行此操作。然后在下班时间将其安排为工作。如果它仍然需要太长时间,那么我建议您分几千次运行它(您必须使用自己的数据库来查看最佳选择,因为您可以安全处理的数量在很大程度上取决于表,索引如何您的服务器速度很快,有多少用户也在尝试针对相同的表进行工作。
回答by Joshua Cauble
Another way that we have used in the past is to create a temp table with the primary keys we want to move and use while loop. This way you can do it in a kind of block fashion so you avoid the large transaction overhead if you canceled and it had to roll back.
我们过去使用的另一种方法是使用我们要移动的主键创建一个临时表并使用 while 循环。通过这种方式,您可以以一种块的方式进行操作,这样您就可以避免在取消并且必须回滚时产生的大量事务开销。
Basically what you end up doing is a insert into tablename (...) select (...) from table name where primary key in (select top 10000 key from temptable)
基本上你最终要做的是插入 tablename (...) select (...) from table name where primary key in (select top 10000 key from temptable)
the top 10000 you want in a secondary result set so you can remove them from the temp table so they don't get processed again.
您需要在辅助结果集中的前 10000 个,以便您可以将它们从临时表中删除,这样它们就不会被再次处理。
Yet another way would be to use cursors to reduce the number of records you process at a time.
另一种方法是使用游标来减少一次处理的记录数。
Another loop method would be to do something like this in a while loop.
另一种循环方法是在 while 循环中执行类似的操作。
declare @stop as int set @stop = (select count(primaryKey) from tableName where primaryKey not in destinstiontable)
声明@stop as int set @stop = (select count(primaryKey) from tableName where primaryKey not in destinationtable)
while (@stop > 0) begin transaction
while (@stop > 0) 开始事务
insert into destinationTable (...) select (...) from sourcetable where primaryKey not in (select primarykey from destinationtable)
insert into destinationTable (...) select (...) from sourcetable where primaryKey not in (select primarykey from destinationtable)
commit
犯罪
set @stop = (select count(primaryKey) from tableName where primaryKey not in destinstiontable) end
set @stop = (select count(primaryKey) from tableName where primaryKey not in destinationtable) end
Not the most effecient but it would work and should allow you to keep the transaction log down. Unless you need it also make sure to use the no lock keyword so that you don't block other transactions when doing this large move (unless you use BCP or DTS as they are much much faster).
不是最有效的,但它可以工作,并且应该允许您保留事务日志。除非您需要它,否则请确保使用 no lock 关键字,以便在执行此大动作时不会阻塞其他事务(除非您使用 BCP 或 DTS,因为它们要快得多)。
Some of what has been said is probably your best bet though. Use BCP, DTS, or some other bulk tool. If you can drop indexes, it will make things go a lot faster.
不过,有些人所说的可能是您最好的选择。使用 BCP、DTS 或其他一些批量工具。如果你可以删除索引,它会让事情变得更快。
回答by Cade Roux
Some good answers here.
这里有一些很好的答案。
Just like to add that if you have indexes on the destination table they will slow down the operation. However, rebuilding the index can sometimes take a long time if you do the drop create technique.
就像补充一点,如果目标表上有索引,它们会减慢操作速度。但是,如果您使用 drop create 技术,重建索引有时可能需要很长时间。
If you don't want to drop the indexes, use an ORDER BY
in your SELECT
that matches the destinationclustered index, this seems to help (probably helps minimize the page splits).
如果您不想删除索引,请使用与目标聚集索引匹配的ORDER BY
in 您的索引,这似乎有帮助(可能有助于最小化页面拆分)。SELECT
回答by Brian Rudolph
Ok, there are a few fundamental issues.
好的,有几个基本问题。
I\O - Inserting into a table while reading from another table will most likely cause disk contention if the tables are not on separate disks. Put the opposing tables on to physically different spindles.
Transaction Log - You need to ensure that your transaction log is on it's own disk, or work in smaller transactions (a few thousand rows at a time) or use BCP\Bulk Insert which is not logged.
Clustered indexes - If you are inserting all of these rows into a target table, and it's clustered index(the physical order data is written to disk) is not writing sequentially, the disk IO requirements go through the roof because of page splits and re-allocation. An easy fix can be to create a clustered index on the recipient table that is a sequential seeded key. This will generally ensure that you get sequential writes into the table and almost always at the end.
File Expansion - Ensure that you have SQL set to expand it's files at a decent rate, like 10% or so. Otherwise it will have to constantly resize it's files and zero out the disk. There are ways to prevent it from having to zero out the disk as well, like enabling the Bulk File Operation permission in your group policies for the Sql Service user.
I\O - 如果表不在单独的磁盘上,则在从另一个表读取时插入表很可能会导致磁盘争用。将相对的桌子放在物理上不同的主轴上。
事务日志 - 您需要确保您的事务日志在它自己的磁盘上,或者在较小的事务中工作(一次几千行)或使用未记录的 BCP\Bulk Insert。
聚集索引 - 如果您将所有这些行插入到目标表中,并且它的聚集索引(物理顺序数据写入磁盘)不是按顺序写入,则磁盘 IO 需求会因为页面拆分和重新分配。一个简单的解决方法是在接收者表上创建一个聚集索引,它是一个顺序种子键。这通常会确保您将顺序写入表中,并且几乎总是在最后。
文件扩展 - 确保您已设置 SQL 以适当的速度扩展其文件,例如 10% 左右。否则,它将不得不不断调整其文件的大小并将磁盘归零。还有一些方法可以防止它必须将磁盘清零,例如在您的组策略中为 Sql Service 用户启用批量文件操作权限。
Quite frankly, other than that and a few of the others suggestions, it is highly unlikely you will make an insert with millions of rows in a transaction be truly fast. If you did this via Bulk Insert, it would be drastically faster, though it might not be what you need from an application perspective.
坦率地说,除了那个和其他一些建议之外,您极不可能在事务中使具有数百万行的插入真正快速。如果您通过批量插入执行此操作,速度会快得多,尽管从应用程序的角度来看,这可能不是您所需要的。
回答by Will Rickards
Have you tested the sql through sql server management studio to see how long it actually takes? I would start there. You may improve the performance of the select. And you might be able to improve the performance with tablock hints on the table your inserting into.
你有没有通过sql server management studio测试过sql,看看实际需要多长时间?我会从那里开始。您可以提高选择的性能。并且您可以通过插入表中的块提示来提高性能。