SQL UPDATE WHERE IN (List) 还是 UPDATE 每个?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33205087/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL UPDATE WHERE IN (List) or UPDATE each individually?
提问by Sean Missingham
I'm doing my best lately to look for the best way to run certain queries in SQL that could potentially be done multiple different ways. Among my research I've come across quite a lot of hate for the WHERE IN concept, due to an inherent inefficiency in how it works.
我最近尽我最大的努力寻找在 SQL 中运行某些查询的最佳方法,这些查询可能以多种不同的方式完成。在我的研究中,我遇到了很多对 WHERE IN 概念的厌恶,因为它的工作方式固有的低效率。
eg: WHERE Col IN (val1, val2, val3)
例如: WHERE Col IN (val1, val2, val3)
In my current project, I'm doing an UPDATE on a large set of data and am wondering which of the following is more efficient: (or whether a better option exists)
在我当前的项目中,我正在对大量数据进行更新,并且想知道以下哪个更有效:(或者是否存在更好的选择)
UPDATE table1 SET somecolumn = 'someVal' WHERE ID IN (id1, id2, id3 ....);
In the above, the list of ID's can be up to 1.5k ID's.
在上面,ID 的列表最多可以有 1.5k 个 ID。
VS
VS
Looping through all ID's in code, and running the following statement for each:
循环遍历代码中的所有 ID,并为每个 ID 运行以下语句:
UPDATE table1 SET somecolumn = 'someVal' WHERE ID = 'theID';
To myself, it seems more logical that the former would work better / faster, because there's less queries to run. That said, I'm not 100% familiar with the in's and out's of SQL and how query queueing works.
对我自己来说,前者工作得更好/更快似乎更合乎逻辑,因为要运行的查询更少。也就是说,我并不是 100% 熟悉 SQL 的来龙去脉以及查询队列的工作原理。
I'm also unsure as to which would be friendlier on the DB as far as table locks and other general performance.
我也不确定就表锁和其他一般性能而言,哪个对数据库更友好。
General info in case it helps, I'm using Microsoft SQL Server 2014, and the primary development language is C#.
一般信息以防万一,我使用的是 Microsoft SQL Server 2014,主要开发语言是 C#。
Any help is much appreciated.
任何帮助深表感谢。
EDIT:
编辑:
Option 3:
选项 3:
UPDATE table1 SET somecolumn = 'someVal' WHERE ID IN (SELECT ID FROM @definedTable);
In the above, @definedTable is a SQL 'User Defined Table Type', where the data inside comes through to a stored procedure as (in C#) type SqlDbType.Structured
在上面,@definedTable 是一个 SQL '用户定义表类型',其中内部的数据作为(在 C# 中)类型 SqlDbType.Structured 进入存储过程
People are asking how the ID's come in:
ID's are in a List<string>
in the code, and are used for other things in the code before then being sent to a stored procedure. Currently, the ID's are coming into the stored procedure as a 'User-Defined Table Type' with only one column (ID's).
人们在问 ID 是如何进来的:IDList<string>
在代码中,并且在被发送到存储过程之前用于代码中的其他内容。目前,ID 作为“用户定义的表类型”进入存储过程,只有一列(ID)。
I thought having them in a table might be better than having the code concatenate a massive string and just spitting it into the SP as a variable that looks like id1, id2, id3, id4
etc
我认为将它们放在表格中可能比让代码连接一个巨大的字符串并将其作为一个看起来像id1, id2, id3, id4
等的变量吐到 SP 中更好
采纳答案by Vladimir Baranov
I'm using your third option and it works great.
我正在使用您的第三个选项,效果很好。
My stored procedure has a table-valued parameter. See also Use Table-Valued Parameters.
In the procedure there is one statement, no loops, like you said:
在程序中有一个语句,没有循环,就像你说的:
UPDATE table1 SET somecolumn = 'someVal' WHERE ID IN (SELECT ID FROM @definedTable);
It is better to call the procedure once, than 1,500 times. It is better to have one transaction, than 1,500 transactions.
调用该过程一次比调用 1,500 次要好。最好有一笔交易,而不是 1,500 笔交易。
If the number of rows in the @definedTable
goes above, say, 10K, I'd consider splitting it in batches of 10K.
如果@definedTable
上面的行数超过 10K,我会考虑将它分成 10K 的批次。
Your first variant is OK for few values in the IN
clause, but when you get to really high numbers (60K+) you can see something like this, as shown in this answer:
您的第一个变体对于IN
子句中的几个值是可以的,但是当您获得非常高的数字(60K+)时,您会看到类似这样的内容,如本答案所示:
Msg 8623, Level 16, State 1, Line 1 The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
消息 8623,级别 16,状态 1,第 1 行 查询处理器耗尽内部资源,无法生成查询计划。这是一种罕见的事件,仅适用于极其复杂的查询或引用大量表或分区的查询。请简化查询。如果您认为自己错误地收到了此消息,请联系客户支持服务以获取更多信息。
回答by Gordon Linoff
Your first or third options are the best way to go. For either of them, you want an index on table1(id)
.
您的第一个或第三个选项是最好的选择。对于它们中的任何一个,您都需要一个索引table1(id)
。
In general, it is better to run one query rather than multiple queries because the overhead of passing data in and out of the database adds up. In addition, each update starts a transactions and commits it -- more overhead. That said, this will probably not be important unless you are updating thousands of records. The overhead is measured in hundreds of microseconds or milliseconds, on a typical system.
通常,最好运行一个查询而不是多个查询,因为将数据传入和传出数据库的开销加起来。此外,每次更新都会启动一个事务并提交——更多的开销。也就是说,除非您要更新数千条记录,否则这可能并不重要。在典型系统上,开销以数百微秒或毫秒为单位进行测量。
回答by Laserbeak
You should definitely NOT use a loop and send an entire new SQL statement for each ID. In that case, the SQL engine has to recompile the SQL statement and come up with an execution plan, etc. every single time.
您绝对不应该使用循环并为每个 ID 发送一个全新的 SQL 语句。在这种情况下,SQL 引擎每次都必须重新编译 SQL 语句并提出执行计划等。
Probably the best thing to do is to make a prepared statement with a placeholder then loop through your data executing the statement for each value. Then the statement stays in the database engine's memory and it quickly just executes it with the new value each time you call it rather than start from scratch.
可能最好的办法是使用占位符制作一个准备好的语句,然后循环遍历您的数据,为每个值执行该语句。然后该语句保留在数据库引擎的内存中,并且每次调用它时它都会使用新值快速执行它,而不是从头开始。
If you have a large database and/or run this often, also make sure you create an index on that ID value, otherwise it will have to do a full table scan with every value.
如果您有一个大型数据库和/或经常运行它,还要确保在该 ID 值上创建一个索引,否则它将必须对每个值进行全表扫描。
EDIT:
编辑:
Perl pseudocode as described below:
Perl 伪代码如下所述:
#!/usr/bin/perl
use DBI;
$dbh = DBI->connect('dbi:Oracle:MY_DB', 'scott', 'tiger', { RaiseError => 1, PrintError =>1, AutoCommit => 0 });
$sth = $dbh->prepare ("UPDATE table1 SET somecolumn = ? WHERE id = ?");
foreach $tuple (@updatetuples) {
$sth->execute($$tuple[1], $$tuple[0]);
}
$dbh->commit;
$sth->finish;
$dbh->disconnect;
exit (0);