MySQL 哪个更快:多个单插入还是一个多行插入?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1793169/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Which is faster: multiple single INSERTs or one multiple-row INSERT?
提问by dusoft
I am trying to optimize one part of my code that inserts data into MySQL. Should I chain INSERTs to make one huge multiple-row INSERT or are multiple separate INSERTs faster?
我正在尝试优化将数据插入 MySQL 的代码的一部分。我应该链接 INSERT 来制作一个巨大的多行 INSERT 还是多个单独的 INSERT 更快?
回答by mbarkhau
https://dev.mysql.com/doc/refman/8.0/en/insert-optimization.html
https://dev.mysql.com/doc/refman/8.0/en/insert-optimization.html
The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:
- Connecting: (3)
- Sending query to server: (2)
- Parsing query: (2)
- Inserting row: (1 × size of row)
- Inserting indexes: (1 × number of indexes)
- Closing: (1)
插入一行所需的时间由以下因素决定,其中数字表示大致比例:
- 连接: (3)
- 向服务器发送查询:(2)
- 解析查询:(2)
- 插入行:(1 × 行大小)
- 插入索引:(1 × 索引数)
- 闭幕式:(1)
From this it should be obvious, that sending one large statement will save you an overhead of 7 per insert statement, which in further reading the text also says:
由此可以看出,发送一个大语句将为每个插入语句节省 7 的开销,在进一步阅读文本时还说:
If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.
如果您同时从同一客户端插入多行,请使用带有多个 VALUES 列表的 INSERT 语句一次插入多行。这比使用单独的单行 INSERT 语句快得多(在某些情况下快很多倍)。
回答by Jon Kloske
I know I'm answering this question almost two and a half years after it was asked, but I just wanted to provide some hard data from a project I'm working on right now that shows that indeed doing multiple VALUE blocks per insert is MUCHfaster than sequential single VALUE block INSERT statements.
我知道我在回答这个问题之前有人问近两年半年后,但我只是想现在该节目确实做得每个刀片有多个值块是提供从项目我工作的一些硬数据MUCH比顺序的单个 VALUE 块 INSERT 语句更快。
The code I wrote for this benchmark in C# uses ODBC to read data into memory from an MSSQL data source (~19,000 rows, all are read before any writing commences), and the MySql .NET connector (Mysql.Data.*) stuff to INSERT the data from memory into a table on a MySQL server via prepared statements. It was written in such a way as to allow me to dynamically adjust the number of VALUE blocks per prepared INSERT (ie, insert n rows at a time, where I could adjust the value of n before a run.) I also ran the test multiple times for each n.
我在 C# 中为此基准编写的代码使用 ODBC 将数据从 MSSQL 数据源(约 19,000 行,在任何写入开始之前读取)和 MySql .NET 连接器 (Mysql.Data.*) 内容读取到内存中通过准备好的语句将内存中的数据插入到 MySQL 服务器上的表中。它的编写方式允许我动态调整每个准备好的 INSERT 的 VALUE 块的数量(即,一次插入 n 行,我可以在运行前调整 n 的值。)我还运行了测试每个 n 多次。
Doing single VALUE blocks (eg, 1 row at a time) took 5.7 - 5.9 seconds to run. The other values are as follows:
执行单个 VALUE 块(例如,一次 1 行)需要 5.7 - 5.9 秒的运行时间。其他值如下:
2 rows at a time: 3.5 - 3.5 seconds
5 rows at a time: 2.2 - 2.2 seconds
10 rows at a time: 1.7 - 1.7 seconds
50 rows at a time: 1.17 - 1.18 seconds
100 rows at a time: 1.1 - 1.4 seconds
500 rows at a time: 1.1 - 1.2 seconds
1000 rows at a time: 1.17 - 1.17 seconds
一次 2 行:3.5 - 3.5 秒一次
5 行:2.2 - 2.2 秒一次
10 行:1.7 - 1.7 秒一次
50 行:1.17 - 1.18 秒一次
100 行:1.1 - 1.4 秒
一次 500 行:1.1 - 1.2 秒一次
1000 行:1.17 - 1.17 秒
So yes, even just bundling 2 or 3 writes together provides a dramatic improvement in speed (runtime cut by a factor of n), until you get to somewhere between n = 5 and n = 10, at which point the improvement drops off markedly, and somewhere in the n = 10 to n = 50 range the improvement becomes negligible.
所以是的,即使只是将 2 或 3 个写入捆绑在一起也可以显着提高速度(运行时间减少 n 倍),直到您达到 n = 5 和 n = 10 之间的某个位置,此时改善显着下降,在 n = 10 到 n = 50 范围内的某个地方,改进变得可以忽略不计。
Hope that helps people decide on (a) whether to use the multiprepare idea, and (b) how many VALUE blocks to create per statement (assuming you want to work with data that may be large enough to push the query past the max query size for MySQL, which I believe is 16MB by default in a lot of places, possibly larger or smaller depending on the value of max_allowed_packet set on the server.)
希望能帮助人们决定 (a) 是否使用 multiprepare 想法,以及 (b) 每个语句创建多少个 VALUE 块(假设您想要处理的数据可能足够大以将查询推送超过最大查询大小对于 MySQL,我认为在很多地方默认为 16MB,可能更大或更小,具体取决于服务器上设置的 max_allowed_packet 的值。)
回答by MarkR
A major factor will be whether you're using a transactional engine and whether you have autocommit on.
一个主要因素是您是否使用事务引擎以及是否启用了自动提交。
Autocommit is on by default and you probably want to leave it on; therefore, each insert that you do does its own transaction. This means that if you do one insert per row, you're going to be committing a transaction for each row.
默认情况下自动提交是开启的,你可能想保持开启;因此,您所做的每个插入都会执行自己的事务。这意味着如果您每行执行一次插入,您将为每一行提交一个事务。
Assuming a single thread, that means that the server needs to sync some data to disc for EVERY ROW. It needs to wait for the data to reach a persistent storage location (hopefully the battery-backed ram in your RAID controller). This is inherently rather slow and will probably become the limiting factor in these cases.
假设一个线程,这意味着服务器需要为每一行同步一些数据到磁盘。它需要等待数据到达持久存储位置(希望是 RAID 控制器中的电池供电内存)。这本质上是相当缓慢的,并且可能会成为这些情况下的限制因素。
I'm of course assuming that you're using a transactional engine (usually innodb) AND that you haven't tweaked the settings to reduce durability.
我当然假设您使用的是事务引擎(通常是 innodb)并且您没有调整设置以降低耐用性。
I'm also assuming that you're using a single thread to do these inserts. Using multiple threads muddies things a bit because some versions of MySQL have working group-commit in innodb - this means that multiple threads doing their own commits can share a single write to the transaction log, which is good because it means fewer syncs to persistent storage.
我还假设您使用单个线程来执行这些插入。使用多线程会使事情变得有些混乱,因为某些版本的 MySQL 在 innodb 中有工作组提交 - 这意味着执行自己提交的多个线程可以共享对事务日志的单次写入,这很好,因为它意味着更少的同步到持久存储.
On the other hand, the upshot is, that you REALLY WANT TO USE multi-row inserts.
另一方面,结果是,您真的很想使用多行插入。
There is a limit over which it gets counter-productive, but in most cases it's at least 10,000 rows. So if you batch them up to 1,000 rows, you're probably safe.
有一个限制,它会适得其反,但在大多数情况下,它至少为 10,000 行。因此,如果您将它们批处理到 1,000 行,您可能是安全的。
If you're using MyISAM, there's a whole other load of things, but I'll not bore you with those. Peace.
如果您使用的是 MyISAM,还有很多其他的东西,但我不会让您厌烦这些。和平。
回答by RC.
Send as many inserts across the wire at one time as possible. The actual insert speed should be the same, but you will see performance gains from the reduction of network overhead.
一次通过电线发送尽可能多的插入。实际插入速度应该相同,但您会看到网络开销减少带来的性能提升。
回答by ennuikiller
In general the less number of calls to the database the better (meaning faster, more efficient), so try to code the inserts in such a way that it minimizes database accesses. Remember, unless your using a connection pool, each databse access has to create a connection, execute the sql, and then tear down the connection. Quite a bit of overhead!
通常,对数据库的调用次数越少越好(意味着更快、更有效),因此请尝试以最小化数据库访问的方式编写插入代码。请记住,除非您使用连接池,否则每次访问数据库都必须创建一个连接,执行 sql,然后断开连接。相当多的开销!
回答by Antibarbie
You might want to :
你可能想要 :
- Check that auto-commit is off
- Open Connection
- Send multiple batches of inserts in a single transaction (size of about 4000-10000 rows ? you see)
- Close connection
- 检查自动提交是否关闭
- 打开连接
- 在单个事务中发送多批插入(大小约为 4000-10000 行?你看)
- 关闭连接
Depending on how well your server scales (its definitively ok with PostgreSQl
, Oracle
and MSSQL
), do the thing above with multiple threads and multiple connections.
根据您的服务器扩展的程度(使用PostgreSQl
,Oracle
和绝对可以MSSQL
),使用多个线程和多个连接执行上述操作。
回答by Chris Williams
In general, multiple inserts will be slower because of the connection overhead. Doing multiple inserts at once will reduce the cost of overhead per insert.
通常,由于连接开销,多次插入会更慢。一次执行多个插入将降低每个插入的开销成本。
Depending on which language you are using, you can possibly create a batch in your programming/scripting language before going to the db and add each insert to the batch. Then you would be able to execute a large batch using one connect operation. Here'san example in Java.
根据您使用的语言,您可以在转到数据库之前用您的编程/脚本语言创建一个批处理并将每个插入添加到批处理中。然后,您将能够使用一个连接操作来执行大批量操作。这是Java 中的一个示例。
回答by A_01
MYSQL 5.5 One sql insert statement took ~300 to ~450ms. while the below stats is for inline multiple insert statments.
MYSQL 5.5 一个 sql 插入语句花费了 ~300 到 ~450 毫秒。而以下统计数据用于内联多个插入语句。
(25492 row(s) affected)
Execution Time : 00:00:03:343
Transfer Time : 00:00:00:000
Total Time : 00:00:03:343
I would say inline is way to go :)
我会说内联是要走的路:)
回答by John
It's ridiculous how bad Mysql and MariaDB are optimized when it comes to inserts. I tested mysql 5.7 and mariadb 10.3, no real difference on those.
在插入方面优化 Mysql 和 MariaDB 是多么荒谬。我测试了 mysql 5.7 和 mariadb 10.3,它们没有真正的区别。
I've tested this on a server with NVME disks, 70,000 IOPS, 1.1 GB/sec seq throughput and that's possible full duplex (read and write).
The server is a high performance server as well.
Gave it 20 GB of ram.
The database completely empty.
我已经在具有 NVME 磁盘、70,000 IOPS、1.1 GB/秒 seq 吞吐量的服务器上对此进行了测试,这可能是全双工(读取和写入)。
该服务器也是高性能服务器。
给它 20 GB 的内存。
数据库完全空了。
The speed I receive was 5000 inserts per second when doing multi row inserts (tried it with 1MB up to 10MB chunks of data)
在进行多行插入时,我收到的速度是每秒 5000 次插入(尝试使用 1MB 到 10MB 的数据块)
Now the clue:
If I add another thread and insert into the SAME tables I suddenly have 2x5000 /sec.
One more thread and I have 15000 total /sec
现在的线索:
如果我添加另一个线程并插入到相同的表中,我突然有 2x5000 /秒。多一个线程,我总共有 15000 个/秒
Consider this: When doing ONE thread inserts it means you can sequentially write to the disk (with exceptions to indexes). When using threads you actually degrade the possible performance because it now needs to do a lot more random accesses. But reality check shows mysql is so badly optimized that threads help a lot.
考虑一下:当执行 ONE 线程插入时,这意味着您可以按顺序写入磁盘(索引除外)。使用线程时,实际上会降低可能的性能,因为它现在需要进行更多的随机访问。但是现实检查表明 mysql 优化得非常糟糕,线程有很大帮助。
The real performance possible with such a server is probably millions per second, the CPU is idle the disk is idle.
The reason is quite clearly that mariadb just as mysql has internal delays.
这种服务器的实际性能可能是每秒数百万,CPU 空闲磁盘空闲。
原因很明显,mariadb和mysql一样有内部延迟。
回答by MSS
multiple inserts are faster but it has thredshould. another thrik is disabling constrains checks temprorary make inserts much much faster. It dosn't matter your table has it or not. For example test disabling foreign keys and enjoy the speed:
多个插入速度更快,但它有阈值。另一个 thrik 是禁用约束检查临时使插入快得多。不管你的桌子有没有。例如测试禁用外键并享受速度:
SET FOREIGN_KEY_CHECKS=0;
offcourse you should turn it back on after inserts by:
offcourse你应该在插入后重新打开它:
SET FOREIGN_KEY_CHECKS=1;
this is common way to inserting huge data. the data integridity may break so you shoud care of that before disabling foreign key checks.
这是插入大量数据的常用方法。数据完整性可能会中断,因此您应该在禁用外键检查之前注意这一点。