MySQL 优化 INSERT 速度因索引而变慢

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16977898/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 17:52:17  来源:igfitidea点击:

MySQL optimizing INSERT speed being slowed down because of indices

mysqlsqlinsertindexing

提问by Peeyush Kushwaha

MySQL Docssay :

MySQL 文档说:

The size of the table slows down the insertion of indexes by log N, assuming B-tree indexes.

假设 B 树索引,表的大小通过 log N 减慢了索引的插入速度。

Does this mean that for insertion of each new row, the insertion speed will be slowed down by a factor of log N where N, I assume is number of rows? even if I insert all rows in just one query? i.e. :

这是否意味着对于每个新行的插入,插入速度将减慢 log N 的因子,其中 N,我假设是行数?即使我只在一个查询中插入所有行?IE :

INSERT INTO mytable VALUES (1,1,1), (2,2,2),  (3,3,3), .... ,(n,n,n)

Where n is ~70,000

其中 n 是 ~70,000

I currently have ~1.47 million rows in a table with the following structure :

我目前在具有以下结构的表中有约 147 万行:

CREATE TABLE mytable (
   `id` INT,
   `value` MEDIUMINT(5),
   `date` DATE,
   PRIMARY_KEY(`id`,`date`)
) ENGINE = InnoDB

When I insert in the above mentioned fashion in a transaction, the commit time taken is ~275 seconds. How can I optimize this, since new data is to be added everyday and the insert time will just keep on slowing down.

当我在事务中以上述方式插入时,提交时间约为 275 秒。我该如何优化这一点,因为每天都会添加新数据,而插入时间只会不断减慢。

Also, is there anything apart from just queries that might help? maybe some configuration settings?

此外,除了可能有帮助的查询之外,还有什么可以帮助的吗?也许一些配置设置?

Possible Method 1 - Removing Indices

可能的方法 1 - 删除索引

I read that removing indices just before insert might help insert speed. And after inserts, I add the index again. But here the only index is primary key, and dropping it won't help much in my opinion. Also, while the primary key is dropped, all the select queries will be crippling slow.

我读到在插入之前删除索引可能有助于插入速度。插入后,我再次添加索引。但是这里唯一的索引是主键,在我看来删除它不会有太大帮助。此外,当主键被删除时,所有的选择查询都会非常缓慢。

I do not know of any other possible methods.

我不知道任何其他可能的方法。

Edit :Here are a few tests on inserting ~60,000 rows in the table with ~1.47 mil rows:

编辑:以下是在表中插入约 60,000 行、约 147 万行的一些测试:

Using the plain query described above :146 seconds

使用上述普通查询:146 秒

Using MySQL's LOAD DATA infile :145 seconds

使用 MySQL 的 LOAD DATA infile :145 秒

Using MySQL's LOAD DATA infile and splitting the csv files as suggested by David Jashi in his answer:136 seconds for 60 files with 1000 rows each, 136 seconds for 6 files with 10,000 rows each

使用 MySQL 的 LOAD DATA infile 并按照 David Jashi 在他的回答中的建议拆分 csv 文件:136 秒,每个 1000 行的 60 个文件,136 秒,每个 10,000 行的 6 个文件

Removing and re-adding primary key :key removal took 11 seconds, 0.8 seconds for inserting data BUT 153 seconds for re-adding primary key, totally taking ~165 seconds

删除和重新添加主键:删除键需要 11 秒,插入数据需要 0.8 秒,但重新添加主键需要 153 秒,总共需要约 165 秒

回答by N.B.

If you want fast inserts, first thing you need is proper hardware. That assumes sufficient amount of RAM, an SSD instead of mechanical drives and rather powerful CPU.

如果您想要快速插入,首先需要的是合适的硬件。这假设有足够的 RAM、SSD 而不是机械驱动器和相当强大的 CPU。

Since you use InnoDB, what you want is to optimize it since default config is designed for slow and old machines.

由于您使用 InnoDB,您想要的是优化它,因为默认配置是为慢速和旧机器设计的。

Here's a great read about configuring InnoDB

这是关于配置 InnoDB 的一个很好的阅读

After that, you need to know one thing - and that's how databases do their stuff internally, how hard drives work and so on. I'll simplify the mechanism in the following description:

在那之后,你需要知道一件事——那就是数据库如何在内部处理它们的东西,硬盘驱动器如何工作等等。我将在以下描述中简化机制:

A transaction is MySQL waiting for the hard drive to confirm that it wrote the data. That's why transactions are slow on mechanical drives, they can do 200-400 input-output operations per second. Translated, that means you can get 200ish insert queries per second using InnoDB on a mechanical drive. Naturally, this is simplified explanation, just to outline what's happening, it's not the full mechanism behind transaction.

事务是 MySQL 等待硬盘驱动器确认它写入数据。这就是为什么机械驱动器上的事务处理很慢的原因,它们每秒可以执行 200-400 次输入-输出操作。翻译过来,这意味着您可以在机械驱动器上使用 InnoDB 每秒获得​​ 200 次左右的插入查询。当然,这是简化的解释,只是为了概述正在发生的事情,它不是 transaction 背后的完整机制

Since a query, especially the one corresponding to size of your table, is relatively small in terms of bytes - you're effectively wasting precious IOPS on a single query.

由于查询(尤其是与表大小相对应的查询)在字节方面相对较小 - 您实际上在单个查询上浪费了宝贵的 IOPS。

If you wrap multiple queries (100 or 200 or more, there's no exact number, you have to test) in a single transaction and then commit it - you'll instantly achieve more writes per second.

如果您在单个事务中包含多个查询(100 或 200 或更多,没有确切的数量,您必须测试),然后提交它 - 您将立即实现每秒更多的写入。

Percona guys are achieving 15k inserts a second on a relatively cheap hardware. Even 5k inserts a second isn't bad. The table such as yours is small, I've done tests on a similar table (3 columns more) and I managed to get to 1 billion records without noticeable issues, using 16gb ram machine with a 240GB SSD (1 drive, no RAID, used for testing purposes).

Percona 的人在相对便宜的硬件上实现了每秒 15k 次插入。即使每秒插入 5k 次也不错。像你这样的表很小,我在一个类似的表上做了测试(多 3 列),我设法在没有明显问题的情况下获得了 10 亿条记录,使用 16GB 内存机器和 240GB SSD(1 个驱动器,没有 RAID,用于测试目的)。

TL;DR: - follow the link above, configure your server, get an SSD, wrap multiple inserts in 1 transactions and profit. And don't turn indexing off and then on, it's not applicable always, because at some point you will spend processing and IO time to build them.

TL;DR: - 按照上面的链接,配置您的服务器,获取 SSD,在 1 个事务中包装多个插入并获利。并且不要关闭索引然后再打开,它并不总是适用,因为在某些时候您将花费处理和 IO 时间来构建它们。

回答by David Jashi

Dropping index will sure help anyway. Also consider using LOAD DATA. You can find some comparison and benchmarks here

无论如何,删除索引肯定会有所帮助。还可以考虑使用LOAD DATA. 你可以在这里找到一些比较和基准

Also, when constructing PRIMARY KEY, use fields, that come first in your table, sequentially, i.e. switch places of second and third fields in structure.

此外,在构造 PRIMARY KEY 时,请依次使用表中第一个出现的字段,即在结构中交换第二个和第三个字段的位置。

回答by Old Pro

If you are doing a bulk insert of a million rows, then dropping the index, doing the insert, and rebuilding the index will probably be faster. However, if your problem is that single row inserts are taking too long then you have other problems (like not enough memory) and dropping the index will not help much.

如果您正在执行一百万行的批量插入,那么删除索引、执行插入和重建索引可能会更快。但是,如果您的问题是单行插入花费的时间太长,那么您还有其他问题(例如内存不足),删除索引也无济于事。

回答by JR Lawhorne

Building/rebuilding the index is what you're trying to speed up. If you must have this table/key structure, faster hardware and/or tweaking the server configuration to speed up the index build is likely the answer - be sure your server and settings are such that it can be accomplished in memory.

构建/重建索引是您想要加快的速度。如果您必须拥有这个表/键结构,更快的硬件和/或调整服务器配置以加快索引构建可能是答案 - 确保您的服务器和设置可以在内存中完成。

Otherwise, think about making trade-offs with the structure that would improve insert speeds. Alternatively, think about ways you can happily live with a 3 minute insert.

否则,请考虑对可以提高插入速度的结构进行权衡。或者,想想你可以通过 3 分钟的插入快乐地生活的方式。

回答by jgmjgm

I have found in some cases inserting in transactions in medium chunks can help as it seems to sometimes allow some bulk operations. In other cases it has made things slower presumably due to locks and the overhead of transactions.

我发现在某些情况下以中等块插入事务会有所帮助,因为它有时似乎允许进行一些批量操作。在其他情况下,可能是由于锁和事务的开销,它使事情变慢了。