SQL 对具有许多索引的表进行慢速批量插入

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/751039/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 01:42:57  来源:igfitidea点击:

Slow bulk insert for table with many indexes

sqlsql-serverindexingbulkinsert

提问by Ole Lynge

I try to insert millions of records into a table that has more than 20 indexes.

我尝试将数百万条记录插入具有 20 多个索引的表中。

In the last run it took more than 4 hours per 100.000 rows, and the query was cancelled after 3? days...

在上次运行中,每 100.000 行花费了 4 多个小时,并且查询在 3? 天...

Do you have any suggestions about how to speed this up.

您对如何加快速度有任何建议。

(I suspect the many indexes to be the cause. If you also think so, how can I automatically drop indexes before the operation, and then create the same indexes afterwards again?)

(我怀疑是索引太多了。如果你也这么认为,我如何在操作前自动删除索引,然后再次创建相同的索引?)

Extra info:

额外信息:

  • The space used by the indexes is about 4 times the space used by the data alone
  • The inserts are wrapped in a transaction per 100.000 rows.
  • 索引使用的空间大约是数据单独使用的空间的 4 倍
  • 插入被包装在每 100.000 行的事务中。


Update on status:

状态更新:

The accepted answer helped me make it much faster.

接受的答案帮助我使它更快。

回答by Lucero

You can disable and enable the indexes. Note that disabling them can have unwanted side-effects (such as having duplicate primary keys or unique indices etc.) which will only be found when re-enabling the indexes.

您可以禁用和启用索引。请注意,禁用它们可能会产生不需要的副作用(例如具有重复的主键或唯一索引等),只有在重新启用索引时才会发现这些副作用。

--Disable Index
ALTER INDEX [IXYourIndex] ON YourTable DISABLE
GO

--Enable Index
ALTER INDEX [IXYourIndex] ON YourTable REBUILD
GO

回答by cindi

This sounds like a data warehouse operation. It would be normal to drop the indexes before the insert and rebuild them afterwards.

这听起来像是数据仓库操作。在插入之前删除索引并在之后重建它们是正常的。

When you rebuild the indexes, build the clustered index first, and conversely drop it last. They should all have fillfactor 100%.

重建索引时,首先建立聚集索引,相反,最后删除它。它们都应该具有 100% 的填充因子。

Code should be something like this

代码应该是这样的

if object_id('Index') is not null drop table IndexList
select name into Index from dbo.sysindexes where id = object_id('Fact')

if exists (select name from Index where name = 'id1') drop index Fact.id1
if exists (select name from Index where name = 'id2') drop index Fact.id2        
if exists (select name from Index where name = 'id3') drop index Fact.id3
.
.
BIG INSERT

RECREATE THE INDEXES

回答by Richard

As noted by another answer disabling indexes will be a very good start.

正如另一个答案所指出的,禁用索引将是一个非常好的开始。

4 hours per 100.000 rows [...] The inserts are wrapped in a transaction per 100.000 rows.

每 100.000 行 4 小时 [...] 插入被包装在每 100.000 行的事务中。

You should look at reducing the number, the server has to maintain a huge amount of state while in a transaction (so it can be rolled back), this (along with the indexes) means adding data is very hard work.

您应该考虑减少数量,服务器必须在事务中维护大量状态(以便可以回滚),这(连同索引)意味着添加数据是一项非常艰巨的工作。

Why not wrap each insert statement in its own transaction?

为什么不将每个插入语句包装在自己的事务中?

Also look at the nature of the SQL you are using, are you adding one row per statement (and network roundtrip), or adding many?

还要查看您正在使用的 SQL 的性质,您是为每个语句添加一行(和网络往返),还是添加许多行?

回答by DannyB

Disabling and then re-enabling indices is frequently suggested in those cases. I have my doubts about this approach though, because:

在这些情况下,经常建议禁用然后重新启用索引。不过,我对这种方法持怀疑态度,因为:

(1) The application's DB user needs schema alteration privileges, which it normally should not possess. (2) The chosen insert approach and/or index schema might be less then optimal in the first place, otherwise rebuilding complete index trees should not be faster then some decent batch-inserting (e.g. the client issuing one insert statement at a time, causing thousands of server-roundtrips; or a poor choice on the clustered index, leading to constant index node splits).

(1) 应用程序的 DB 用户需要模式更改权限,它通常不应该拥有。(2) 所选择的插入方法和/或索引模式一开始可能不是最优的,否则重建完整的索引树不应该比一些体面的批量插入更快(例如,客户端一次发出一个插入语句,导致数以千计的服务器往返;或者在聚​​集索引上选择不当,导致不断的索引节点分裂)。

That's why my suggestions look a little bit different:

这就是为什么我的建议看起来有点不同的原因:

  • Increase ADO.NET BatchSize
  • Choose the target table's clustered index wisely, so that inserts won't lead to clustered index node splits. Usually an identity column is a good choice
  • Let the client insert into a temporary heap table first (heap tables don't have any clustered index); then, issue one big "insert-into-select" statement to push all that staging table data into the actual target table
  • Apply SqlBulkCopy
  • Decrease transaction logging by choosing bulk-logged recovery model
  • 增加 ADO.NET BatchSize
  • 明智地选择目标表的聚集索引,这样插入就不会导致聚集索引节点分裂。通常标识列是一个不错的选择
  • 让客户端先插入一个临时堆表(堆表没有聚集索引);然后,发出一个大的“insert-into-select”语句,将所有临时表数据推送到实际的目标表中
  • 应用 SqlBulkCopy
  • 通过选择大容量日志恢复模型来减少事务日志记录

You might find more detailled information in this article.

您可能会在本文中找到更详细的信息。