SQL 多个 INSERT 语句与具有多个 VALUES 的单个 INSERT

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8635818/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 13:44:55  来源:igfitidea点击:

Multiple INSERT statements vs. single INSERT with multiple VALUES

sqlsql-serverperformancesql-server-2008tsql

提问by Borka

I'm running a performance comparison between using 1000 INSERT statements:

我正在使用 1000 个 INSERT 语句进行性能比较:

INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('6f3f7257-a3d8-4a78-b2e1-c9b767cfe1c1', 'First 0', 'Last 0', 0)
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('32023304-2e55-4768-8e52-1ba589b82c8b', 'First 1', 'Last 1', 1)
...
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('f34d95a7-90b1-4558-be10-6ceacd53e4c4', 'First 999', 'Last 999', 999)

..versus using single INSERT statement with 1000 values:

..与使用具有 1000 个值的单个 INSERT 语句相比:

INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
VALUES 
('db72b358-e9b5-4101-8d11-7d7ea3a0ae7d', 'First 0', 'Last 0', 0),
('6a4874ab-b6a3-4aa4-8ed4-a167ab21dd3d', 'First 1', 'Last 1', 1),
...
('9d7f2a58-7e57-4ed4-ba54-5e9e335fb56c', 'First 999', 'Last 999', 999)

To my big surprise, the results are the opposite of what I thought:

令我惊讶的是,结果与我的想法相反:

  • 1000 INSERT statements: 290 msec.
  • 1 INSERT statement with 1000 VALUES: 2800 msec.
  • 1000 条 INSERT 语句:290 毫秒。
  • 1 个带有 1000 个值的 INSERT 语句:2800 毫秒。

The test is executed directly in MSSQL Management Studio with SQL Server Profiler used for measurement (and I've got similar results running it from C# code using SqlClient, which is even more suprising considering all the DAL layers roundtrips)

该测试直接在 MSSQL Management Studio 中执行,使用 SQL Server Profiler 进行测量(我使用 SqlClient 从 C# 代码运行它得到了类似的结果,考虑到所有 DAL 层往返,这更令人惊讶)

Can this be reasonable or somehow explained? How come, a supposedly faster method results in 10 times (!) worseperformance?

这是否合理或以某种方式解释?为什么,一个据称更快的方法会导致性能下降10 倍(!)?

Thank you.

谢谢你。

EDIT: Attaching execution plans for both: Exec Plans

编辑:附加两者的执行计划: 执行计划

回答by Martin Smith

Addition:SQL Server 2012 shows some improved performance in this area but doesn't seem to tackle the specific issues noted below. This should apparently be fixedin the next major version afterSQL Server 2012!

补充:SQL Server 2012 在这方面表现出一些改进的性能,但似乎没有解决下面提到的具体问题。这显然应该SQL Server 2012之后的下一个主要版本中修复

Your plan shows the single inserts are using parameterised procedures (possibly auto parameterised) so parse/compile time for these should be minimal.

您的计划显示单个插入正在使用参数化过程(可能是自动参数化),因此这些的解析/编译时间应该是最少的。

I thought I'd look into this a bit more though so set up a loop (script) and tried adjusting the number of VALUESclauses and recording the compile time.

我想我会更深入地研究这个问题,所以设置一个循环(脚本)并尝试调整VALUES子句的数量并记录编译时间。

I then divided the compile time by the number of rows to get the average compile time per clause. The results are below

然后我将编译时间除以行数以获得每个子句的平均编译时间。结果如下

Graph

图形

Up until 250 VALUESclauses present the compile time / number of clauses has a slight upward trend but nothing too dramatic.

直到 250 个VALUES子句呈现,编译时间/子句数量略有上升趋势,但没有太大的变化。

Graph

图形

But then there is a sudden change.

但随后发生了突然的变化。

That section of the data is shown below.

该部分数据如下所示。

+------+----------------+-------------+---------------+---------------+
| Rows | CachedPlanSize | CompileTime | CompileMemory | Duration/Rows |
+------+----------------+-------------+---------------+---------------+
|  245 |            528 |          41 |          2400 | 0.167346939   |
|  246 |            528 |          40 |          2416 | 0.162601626   |
|  247 |            528 |          38 |          2416 | 0.153846154   |
|  248 |            528 |          39 |          2432 | 0.157258065   |
|  249 |            528 |          39 |          2432 | 0.156626506   |
|  250 |            528 |          40 |          2448 | 0.16          |
|  251 |            400 |         273 |          3488 | 1.087649402   |
|  252 |            400 |         274 |          3496 | 1.087301587   |
|  253 |            400 |         282 |          3520 | 1.114624506   |
|  254 |            408 |         279 |          3544 | 1.098425197   |
|  255 |            408 |         290 |          3552 | 1.137254902   |
+------+----------------+-------------+---------------+---------------+

The cached plan size which had been growing linearly suddenly drops but CompileTime increases 7 fold and CompileMemory shoots up. This is the cut off point between the plan being an auto parametrized one (with 1,000 parameters) to a non parametrized one. Thereafter it seems to get linearly less efficient (in terms of number of value clauses processed in a given time).

一直线性增长的缓存计划大小突然下降,但 CompileTime 增加了 7 倍,CompileMemory 猛增。这是自动参数化计划(具有 1,000 个参数)与非参数化计划之间的分界点。此后,它的效率似乎线性降低(就给定时间内处理的值子句数量而言)。

Not sure why this should be. Presumably when it is compiling a plan for specific literal values it must perform some activity that does not scale linearly (such as sorting).

不知道为什么会这样。据推测,当它为特定文字值编译计划时,它必须执行一些不能线性扩展的活动(例如排序)。

It doesn't seem to affect the size of the cached query plan when I tried a query consisting entirely of duplicate rows and neither affects the order of the output of the table of the constants (and as you are inserting into a heap time spent sorting would be pointless anyway even if it did).

当我尝试完全由重复行组成的查询时,它似乎不会影响缓存查询计划的大小,也不影响常量表的输出顺序(并且当您插入堆时花费的时间进行排序无论如何,即使这样做也毫无意义)。

Moreover if a clustered index is added to the table the plan still shows an explicit sort step so it doesn't seem to be sorting at compile time to avoid a sort at run time.

此外,如果将聚集索引添加到表中,该计划仍会显示显式排序步骤,因此它似乎不会在编译时进行排序以避免在运行时进行排序。

Plan

计划

I tried to look at this in a debugger but the public symbols for my version of SQL Server 2008 don't seem to be available so instead I had to look at the equivalent UNION ALLconstruction in SQL Server 2005.

我试图在调试器中查看这个,但我的 SQL Server 2008 版本的公共符号似乎不可用,所以我不得不查看UNION ALLSQL Server 2005中的等效结构。

A typical stack trace is below

典型的堆栈跟踪如下

sqlservr.exe!FastDBCSToUnicode()  + 0xac bytes  
sqlservr.exe!nls_sqlhilo()  + 0x35 bytes    
sqlservr.exe!CXVariant::CmpCompareStr()  + 0x2b bytes   
sqlservr.exe!CXVariantPerformCompare<167,167>::Compare()  + 0x18 bytes  
sqlservr.exe!CXVariant::CmpCompare()  + 0x11f67d bytes  
sqlservr.exe!CConstraintItvl::PcnstrItvlUnion()  + 0xe2 bytes   
sqlservr.exe!CConstraintProp::PcnstrUnion()  + 0x35e bytes  
sqlservr.exe!CLogOp_BaseSetOp::PcnstrDerive()  + 0x11a bytes    
sqlservr.exe!CLogOpArg::PcnstrDeriveHandler()  + 0x18f bytes    
sqlservr.exe!CLogOpArg::DeriveGroupProperties()  + 0xa9 bytes   
sqlservr.exe!COpArg::DeriveNormalizedGroupProperties()  + 0x40 bytes    
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x18a bytes   
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x146 bytes   
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x146 bytes   
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x146 bytes   
sqlservr.exe!CQuery::PqoBuild()  + 0x3cb bytes  
sqlservr.exe!CStmtQuery::InitQuery()  + 0x167 bytes 
sqlservr.exe!CStmtDML::InitNormal()  + 0xf0 bytes   
sqlservr.exe!CStmtDML::Init()  + 0x1b bytes 
sqlservr.exe!CCompPlan::FCompileStep()  + 0x176 bytes   
sqlservr.exe!CSQLSource::FCompile()  + 0x741 bytes  
sqlservr.exe!CSQLSource::FCompWrapper()  + 0x922be bytes    
sqlservr.exe!CSQLSource::Transform()  + 0x120431 bytes  
sqlservr.exe!CSQLSource::Compile()  + 0x2ff bytes   

So going off the names in the stack trace it appears to spend a lot of time comparing strings.

因此,在堆栈跟踪中删除名称似乎花费了大量时间来比较字符串。

This KB articleindicates that DeriveNormalizedGroupPropertiesis associated with what used to be called the normalizationstage of query processing

这篇知识库文章表明这DeriveNormalizedGroupProperties与过去称为查询处理的规范化阶段有关

This stage is now called binding or algebrizing and it takes the expression parse tree output from the previous parse stage and outputs an algebrized expression tree (query processor tree) to go forward to optimization (trivial plan optimization in this case) [ref].

这个阶段现在称为绑定或代数化,它采用前一个解析阶段的表达式解析树输出并输出代数化的表达式树(查询处理器树)以进行优化(在这种情况下为平凡计划优化)[参考]

I tried one more experiment (Script) which was to re-run the original test but looking at three different cases.

我尝试了另一个实验 ( Script),它是重新运行原始测试,但查看了三种不同的情况。

  1. First Name and Last Name Strings of length 10 characters with no duplicates.
  2. First Name and Last Name Strings of length 50 characters with no duplicates.
  3. First Name and Last Name Strings of length 10 characters with all duplicates.
  1. 名字和姓氏长度为 10 个字符的字符串,没有重复项。
  2. 名字和姓氏长度为 50 个字符的字符串,没有重复项。
  3. 名字和姓氏长度为 10 个字符的字符串,所有重复项。

Graph

图形

It can clearly be seen that the longer the strings the worse things get and that conversely the more duplicates the better things get. As previously mentioned duplicates don't affect the cached plan size so I presume that there must be a process of duplicate identification when constructing the algebrized expression tree itself.

可以清楚地看到,字符串越长,事情变得越糟糕,相反,重复越多,事情变得越好。如前所述,重复不会影响缓存的计划大小,因此我认为在构建代数表达式树本身时必须有一个重复识别过程。

Edit

编辑

One place where this information is leveraged is shown by @Lieven here

@Lieven 在这里显示了一个利用此信息的地方

SELECT * 
FROM (VALUES ('Lieven1', 1),
             ('Lieven2', 2),
             ('Lieven3', 3))Test (name, ID)
ORDER BY name, 1/ (ID - ID) 

Because at compile time it can determine that the Namecolumn has no duplicates it skips ordering by the secondary 1/ (ID - ID)expression at run time (the sort in the plan only has one ORDER BYcolumn) and no divide by zero error is raised. If duplicates are added to the table then the sort operator shows two order by columns and the expected error is raised.

因为在编译时它可以确定该Name列没有重复项,所以它1/ (ID - ID)在运行时跳过二级表达式的排序(计划中的排序只有一ORDER BY列)并且不会引发除以零错误。如果将重复项添加到表中,则排序运算符按列显示两个顺序,并引发预期错误。

回答by dasblinkenlight

It is not too surprising: the execution plan for the tiny insert is computed once, and then reused 1000 times. Parsing and preparing the plan is quick, because it has only four values to del with. A 1000-row plan, on the other hand, needs to deal with 4000 values (or 4000 parameters if you parameterized your C# tests). This could easily eat up the time savings you gain by eliminating 999 roundtrips to SQL Server, especially if your network is not overly slow.

这并不奇怪:微小插入的执行计划计算一次,然后重复使用 1000 次。解析和准备计划很快,因为它只有四个值要删除。另一方面,一个 1000 行的计划需要处理 4000 个值(如果参数化了 C# 测试,则需要处理 4000 个参数)。通过消除到 SQL Server 的 999 次往返,这很容易耗尽您节省的时间,尤其是在您的网络不太慢的情况下。

回答by RickNZ

The issue probably has to do with the time it takes to compile the query.

该问题可能与编译查询所需的时间有关。

If you want to speed up the inserts, what you really need to do is wrap them in a transaction:

如果您想加快插入速度,您真正需要做的是将它们包装在一个事务中:

BEGIN TRAN;
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('6f3f7257-a3d8-4a78-b2e1-c9b767cfe1c1', 'First 0', 'Last 0', 0);
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('32023304-2e55-4768-8e52-1ba589b82c8b', 'First 1', 'Last 1', 1);
...
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('f34d95a7-90b1-4558-be10-6ceacd53e4c4', 'First 999', 'Last 999', 999);
COMMIT TRAN;

From C#, you might also consider using a table valued parameter. Issuing multiple commands in a single batch, by separating them with semicolons, is another approach that will also help.

在 C# 中,您还可以考虑使用表值参数。在单个批处理中发出多个命令,通过用分号分隔它们,是另一种也有帮助的方法。

回答by uceumern

I ran into a similar situation trying to convert a table with several 100k rows with a C++ program (MFC/ODBC).

我遇到了类似的情况,试图用 C++ 程序(MFC/ODBC)转换一个有几个 100k 行的表。

Since this operation took a very long time, I figured bundling multiple inserts into one (up to 1000 due to MSSQL limitations). My guess that a lot of single insert statements would create an overhead similar to what is described here.

由于此操作需要很长时间,我想将多个插入捆绑为一个(由于MSSQL 限制,最多 1000 个)。我的猜测是,许多单个插入语句会产生类似于此处描述的开销。

However, it turns out that the conversion took actually quite a bit longer:

但是,事实证明转换实际上花费了更长的时间:

        Method 1       Method 2     Method 3 
        Single Insert  Multi Insert Joined Inserts
Rows    1000           1000         1000
Insert  390 ms         765 ms       270 ms
per Row 0.390 ms       0.765 ms     0.27 ms

So, 1000 single calls to CDatabase::ExecuteSql each with a single INSERT statement (method 1) are roughly twice as fast as a single call to CDatabase::ExecuteSql with a multi-line INSERT statement with 1000 value tuples (method 2).

因此,对 CDatabase::ExecuteSql 的 1000 次单次调用(每个调用使用单个 INSERT 语句(方法 1))大约是使用具有 1000 个值元组的多行 INSERT 语句(方法 2)对 CDatabase::ExecuteSql 的单次调用快两倍。

Update: So, the next thing I tried was to bundle 1000 separate INSERT statements into a single string and have the server execute that (method 3). It turns out this is even a bit faster than method 1.

更新:所以,接下来我尝试将 1000 个单独的 INSERT 语句捆绑到一个字符串中,并让服务器执行该语句(方法 3)。事实证明,这甚至比方法 1 快一点。

Edit: I am using Microsoft SQL Server Express Edition (64-bit) v10.0.2531.0

编辑:我使用的是 Microsoft SQL Server Express Edition(64 位)v10.0.2531.0