.net SqlBulkCopy 的推荐批处理大小是多少?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/779690/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 12:34:13  来源:igfitidea点击:

What is the recommended batch size for SqlBulkCopy?

.netsql-serverperformanceado.net

提问by Jonathan Allen

What is the recommended batch size for SqlBulkCopy? I'm looking for a general formula I can use as a starting point for performance tuning.

推荐的批量大小是SqlBulkCopy多少?我正在寻找一个通用公式,我可以将其用作性能调整的起点。

回答by Alric

I have an import utility sitting on the same physical server as my SQL Server instance. Using a custom IDataReader, it parses flat files and inserts them into a database using SQLBulkCopy. A typical file has about 6M qualified rows, averaging 5 columns of decimal and short text, about 30 bytes per row.

我有一个导入实用程序,它与我的 SQL Server 实例位于同一台物理服务器上。使用自定义IDataReader,它解析平面文件并使用SQLBulkCopy. 一个典型的文件有大约 6M 行,平均 5 列十进制和短文本,每行大约 30 个字节。

Given this scenario, I found a batch size of 5,000 to be the best compromise of speed and memory consumption. I started with 500 and experimented with larger. I found 5000 to be 2.5x faster, on average, than 500. Inserting the 6 million rows takes about 30 seconds with a batch size of 5,000 and about 80 seconds with batch size of 500.

鉴于这种情况,我发现 5,000 的批量大小是速度和内存消耗的最佳折衷方案。我从 500 开始并尝试了更大的。我发现 5000 平均比 500 快 2.5 倍。插入 600 万行需要大约 30 秒,批量大小为 5,000,批量大小为 500 需要大约 80 秒。

10,000 was not measurably faster. Moving up to 50,000 improved the speed by a few percentage points but it's not worth the increased load on the server. Above 50,000 showed no improvements in speed.

10,000 并没有明显更快。移动到 50,000 将速度提高了几个百分点,但不值得增加服务器上的负载。超过 50,000 显示速度没有提高。

This isn't a formula, but it's another data point for you to use.

这不是公式,而是供您使用的另一个数据点。

回答by MagicAndi

This is an issue I have also spent some time looking into. I am looking to optimize importing large CSV files (16+ GB, 65+ million records, and growing) into a SQL Server 2005 database using a C# console application (.Net 2.0). As Jeremyhas already pointed out, you will need to do some fine-tuning for your particular circumstances, but I would recommend you have an initial batch size of 500, and test values both above and below this.

这是我也花了一些时间研究的一个问题。我希望使用 C# 控制台应用程序 (.Net 2.0) 优化将大型 CSV 文件(16+ GB、65+ 百万条记录,并且还在不断增长)导入 SQL Server 2005 数据库的过程。正如杰里米已经指出的那样,你需要做一些微调您的具体情况,但我会建议你有500的首批大小,上面和下面这个测试值。

I got the recommendation to test values between 100 and 1000 for batch size from this MSDN forum post, and was skeptical. But when I tested for batch sizes between 100 and 10,000, I found that 500 was the optimal value for my application. The 500 value for SqlBulkCopy.BatchSizeis also recommended here.

我从这篇MSDN 论坛帖子中得到了测试 100 到 1000 之间批量大小值的建议,对此我持怀疑态度。但是当我测试 100 到 10,000 之间的批量大小时,我发现 500 是我的应用程序的最佳值。此处SqlBulkCopy.BatchSize还建议使用 500 值。

To further optimize your SqlBulkCopy operation, check out this MSDN advice; I find that using SqlBulkCopyOptions.TableLock helps to reduce loading time.

要进一步优化您的 SqlBulkCopy 操作,请查看此MSDN 建议;我发现使用 SqlBulkCopyOptions.TableLock 有助于减少加载时间。

回答by Ray

As others have stated, it depends on your environment specifically the row volume and network latency.

正如其他人所说,这取决于您的环境,特别是行卷和网络延迟。

Personally, I'd start with setting the BatchSizeproperty to 1000 rows and see how that performs. If it works, then I keep doubling the number of rows (e.g. to 2000, 4000, etc.) until I get a timeout.

就我个人而言,我首先将BatchSize属性设置为 1000 行,然后看看它的表现如何。如果它有效,那么我会继续将行数加倍(例如到 2000、4000 等),直到超时。

Otherwise, if a timeout occurs at 1000, then I decrease the number of rows by half (e.g. 500) until it works.

否则,如果超时发生在 1000,那么我将行数减少一半(例如 500),直到它起作用为止。

In each case, I keep doubling (if successful) or halving (if failed) the differencebetween each of the last two attempted batch sizes until finding a sweet spot.

在每种情况下,我都会将最后两个尝试的批次大小之间的差异加倍(如果成功)或减半(如果失败),直到找到最佳点。

The other factor to consider is how long does it take to copy a singlebatch of rows. Timeouts will occur if the batch of rows being copied exceeds the BulkCopyTimeoutproperty which by default is 30 seconds. You might try doubling the BulkCopyTimeoutproperty to 60 seconds. This allows a longer period of time for a larger set of batch rows to be copied. For example, a batch of 50,000 rows might take around 40 seconds just exceeding the 30 seconds time limit so bumping it up to 60 seconds might help with the performance.

另外要考虑的因素是它需要多长时间来复制一个单一的一批行。如果正在复制的一批行超过BulkCopyTimeout默认为 30 秒的属性,则会发生超时。您可以尝试将该BulkCopyTimeout属性加倍到 60 秒。这允许更长的时间来复制更大的批处理行。例如,一批 50,000 行可能需要大约 40 秒,刚刚超过 30 秒的时间限制,因此将其提高到 60 秒可能有助于提高性能。

回答by Jeremy

This all depends on your implementation.

这一切都取决于您的实施。

What kind of speed can you expect on your network? Are you using it in Forms or ASP.Net? Do you need to alert the user of progress? What is the size of the total job?

您对网络的期望速度是多少?您是在 Forms 还是 ASP.Net 中使用它?你需要提醒用户进度吗?整个工作的规模是多少?

In my experience running bulk copy without a batch size specified will cause timeout issues. I Like to start with something like 1000 records and do some adjustments from there.

根据我的经验,在没有指定批量大小的情况下运行批量复制会导致超时问题。我喜欢从 1000 条记录开始,然后从那里做一些调整。

回答by A Mustapha

i had tried multiple sizes, in my case 5000 was good

我试过多种尺寸,在我的情况下 5000 很好