SQL 为什么批量插入/更新更快?批量更新如何工作?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1006969/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why are batch inserts/updates faster? How do batch updates work?
提问by letronje
Why are batch inserts faster? Is it because the connection and setup overhead for inserting a single row is the same for a set of rows? What other factors make batch inserts faster?
为什么批量插入更快?是否因为插入单行的连接和设置开销对于一组行是相同的?还有哪些因素使批量插入更快?
How do batch updates work? Assuming the table has no uniqueness constraints, insert statements don't really have any effect on other insert statements in the batch. However, during batch updates, an update can alter the state of the table and hence can affect the outcome of other update queries in the batch.
批量更新如何工作?假设表没有唯一性约束,insert 语句实际上对批处理中的其他 insert 语句没有任何影响。但是,在批量更新期间,更新可能会改变表的状态,因此会影响批处理中其他更新查询的结果。
I know that batch insert queries have a syntax where you have all the insert values in one big query. How do batch update queries look like? For e.g. if i have single update queries of the form:
我知道批量插入查询的语法是在一个大查询中包含所有插入值。批量更新查询是什么样的?例如,如果我有表单的单个更新查询:
update <table> set <column>=<expression> where <condition1>
update <table> set <column>=<expression> where <condition2>
update <table> set <column>=<expression> where <condition3>
update <table> set <column>=<expression> where <condition4>
What happens when they are used in a batch. What will the single query look like ?
当它们成批使用时会发生什么。单个查询会是什么样子?
And are batch inserts & updates part of the SQL standard?
批量插入和更新是 SQL 标准的一部分吗?
回答by Qerr
I was looking for an answer on the same subject, about "bulk/batch" update. People often describe the problem by comparing it with insert clause with multiple value sets (the "bulk" part).
我正在寻找关于“批量/批量”更新的同一主题的答案。人们通常通过将问题与具有多个值集(“批量”部分)的插入子句进行比较来描述问题。
INSERT INTO mytable (mykey, mytext, myint)
VALUES
(1, 'text1', 11),
(2, 'text2', 22),
...
Clear answer was still avoiding me, but I found the solution here: http://www.postgresql.org/docs/9.1/static/sql-values.html
明确的答案仍然避开我,但我在这里找到了解决方案:http: //www.postgresql.org/docs/9.1/static/sql-values.html
To make it clear:
要说清楚:
UPDATE mytable
SET
mytext = myvalues.mytext,
myint = myvalues.myint
FROM (
VALUES
(1, 'textA', 99),
(2, 'textB', 88),
...
) AS myvalues (mykey, mytext, myint)
WHERE mytable.mykey = myvalues.mykey
It has the same property of being "bulk" aka containing alot of data with one statement.
它具有相同的属性,即“批量”又名包含一个语句的大量数据。
回答by Quassnoi
Why are batch inserts faster?
为什么批量插入更快?
For numerous reasons, but the major three are these:
出于多种原因,但主要的三个是这些:
- The query doesn't need to be reparsed.
- The values are transmitted in one round-trip to the server
- The commands are inside a single transaction
- 不需要重新解析查询。
- 这些值在一次往返中传输到服务器
- 命令在单个事务中
Is it because the connection and setup overhead for inserting a single row is the same for a set of rows?
是否因为插入单行的连接和设置开销对于一组行是相同的?
Partially yes, see above.
部分是,见上文。
How do batch updates work?
批量更新如何工作?
This depends on RDBMS
.
这取决于RDBMS
.
In Oracle
you can transmit all values as a collection and use this collection as a table in a JOIN
.
在Oracle
可以传送所有的值作为收集和使用这个集合作为一个表JOIN
。
In PostgreSQL
and MySQL
, you can use the following syntax:
在PostgreSQL
and 中MySQL
,您可以使用以下语法:
INSERT
INTO mytable
VALUES
(value1),
(value2),
…
You can also prepare a query once and call it in some kind of a loop. Usually there are methods to do this in a client library.
您还可以准备一次查询并在某种循环中调用它。通常在客户端库中有一些方法可以做到这一点。
Assuming the table has no uniqueness constraints, insert statements don't really have any effect on other insert statements in the batch. But, during batch updates, an update can alter the state of the table and hence can affect the outcome of other update queries in the batch.
假设表没有唯一性约束,insert 语句实际上对批处理中的其他 insert 语句没有任何影响。但是,在批量更新期间,更新可能会改变表的状态,因此会影响批处理中其他更新查询的结果。
Yes, and you may or may not benefit from this behavior.
是的,您可能会也可能不会从这种行为中受益。
I know that batch insert queries have a syntax where you have all the insert values in one big query. How do batch update queries look like?
我知道批量插入查询的语法是在一个大查询中包含所有插入值。批量更新查询是什么样的?
In Oracle
, you use collection in a join:
在 中Oracle
,您在连接中使用集合:
MERGE
INTO mytable
USING TABLE(:mycol)
ON …
WHEN MATCHED THEN
UPDATE
SET …
In PostgreSQL
:
在PostgreSQL
:
UPDATE mytable
SET s.s_start = 1
FROM (
VALUES
(value1),
(value2),
…
) q
WHERE …
回答by hagello
The other posts explain why bulk statements are faster and how to do it with literal values.
其他帖子解释了为什么批量语句更快以及如何使用文字值来做到这一点。
I think it is important to know how to do it with placeholders. Not using placeholders may lead to gigantic command strings, to quoting/escaping bugs and thereby to applications that are prone to SQL injection.
我认为重要的是要知道如何使用占位符来做到这一点。不使用占位符可能会导致巨大的命令字符串、引用/转义错误,从而导致易于 SQL 注入的应用程序。
Bulk insert with placeholders in PostgreSQL >= 9.1
在 PostgreSQL >= 9.1 中使用占位符批量插入
To insert an arbitrary numbers of rows into table "mytable", consisting of columns "col1, "col2" and "col3", all in one got (one statement, one transaction):
要将任意数量的行插入到表“mytable”中,该表由“col1”、“col2”和“col3”列组成,全部合二为一(一个语句,一个事务):
INSERT INTO mytable (col1, col2, col3)
VALUES (unnest(?), unnest(?), unnest(?))
You need to supply three arguments to this statement. The first one has to contain all the values for the first column and so on. Consequently, all the arguments have to be lists/vectors/arrays of equal length.
您需要为此语句提供三个参数。第一个必须包含第一列的所有值,依此类推。因此,所有参数都必须是等长的列表/向量/数组。
Bulk update with placeholders in PostgreSQL >= 9.1
使用 PostgreSQL >= 9.1 中的占位符批量更新
Let's say, your table is called "mytable". It consists of the columns "key" and "value".
假设您的表名为“mytable”。它由列“键”和“值”组成。
update mytable
set value = data_table.new_value
from
(select unnest(?) as key, unnest(?) as new_value) as data_table
where mytable.key = data_table.key
I know, this is not easy to understand. It looks like obfuscated SQL. On the other side: It works, it scales, it works without any string concatenation, it is safe and it is blazingly fast.
我知道,这并不容易理解。它看起来像混淆的 SQL。另一方面:它可以工作,可以扩展,可以在没有任何字符串连接的情况下工作,它是安全的,而且速度非常快。
You need to supply two arguments to this statement. The first one has to be a list/vector/array that contains all the values for column "key". Of course, the second one has to contain all the values for column "value".
您需要为此语句提供两个参数。第一个必须是包含列“键”的所有值的列表/向量/数组。当然,第二个必须包含列“值”的所有值。
In case you hit size limits, you may have to look into COPY INTO ... FROM STDIN
(PostgreSQL).
如果您达到大小限制,您可能需要查看COPY INTO ... FROM STDIN
(PostgreSQL)。
回答by HLGEM
In a batch updates, the database works against a set of data, in a row by row update it has to run the same command as may times as there are rows. So if you insert a million rows in a batch, the command is sent and processed once and in a row-by row update, it is sent and processed a million times. This is also why you never want to use a cursor in SQL Server or a correlated subquery.
在批量更新中,数据库针对一组数据工作,在逐行更新中,它必须运行与行相同的命令。因此,如果您批量插入一百万行,该命令将被发送和处理一次,并在逐行更新中被发送和处理一百万次。这也是为什么您永远不想在 SQL Server 或相关子查询中使用游标的原因。
an example of a set-based update in SQL server:
SQL Server 中基于集合的更新示例:
update mytable
set myfield = 'test'
where myfield is null
This would update all 1 million records that are null in one step. A cursor update (which is how you would update a million rows in a non-batch fashion) would iterate through each row one a time and update it.
这将一步更新所有 100 万条为空的记录。游标更新(这是您以非批处理方式更新一百万行的方式)将一次遍历每一行并更新它。
The problem with a batch insert is the size of the batch. If you try to update too many records at once, the database may lock the table for the duration of the process, locking all other users out. So you may need to do a loop that takes only part of the batch at a time (but pretty much any number greater than one row at time will be faster than one row at a time) This is slower than updating or inserting or deleting the whole batch, but faster than row-by row operations and may be needed in a production environment with many users and little available downtime when users are not trying to see and update other records in the same table. The size of the batch depends greatly on the database structure and exactly what is happening (tables with triggers and lots of constraints are slower as are tables with lots of fields and so require smaller batches).
批量插入的问题在于批量的大小。如果您尝试一次更新太多记录,数据库可能会在整个过程中锁定表,从而锁定所有其他用户。所以你可能需要做一个循环,一次只取批处理的一部分(但几乎任何一次大于一行的数字都会比一次一行快)这比更新或插入或删除要慢整批,但比逐行操作更快,并且可能需要在具有许多用户且几乎没有可用停机时间的生产环境中,当用户不尝试查看和更新同一表中的其他记录时。