MySQL 如何提高大型 InnoDB 表的 DELETE FROM 性能?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14284238/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I improve DELETE FROM performance on large InnoDB tables?
提问by mpe
I have a fairly large InnoDB table which contains about 10 million rows (and counting, it is expected to become 20 times that size). Each row is not that large (131 B on average), but from time to time I have to delete a chunk of them, and that is taking ages. This is the table structure:
我有一个相当大的 InnoDB 表,其中包含大约 1000 万行(并且计数,预计将成为该大小的 20 倍)。每行都不是那么大(平均 131 B),但有时我不得不删除其中的一大块,这需要很长时间。这是表结构:
CREATE TABLE `problematic_table` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`taxid` int(10) unsigned NOT NULL,
`blastdb_path` varchar(255) NOT NULL,
`query` char(32) NOT NULL,
`target` int(10) unsigned NOT NULL,
`score` double NOT NULL,
`evalue` varchar(100) NOT NULL,
`log_evalue` double NOT NULL DEFAULT '-999',
`start` int(10) unsigned DEFAULT NULL,
`end` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxid` (`taxid`),
KEY `query` (`query`),
KEY `target` (`target`),
KEY `log_evalue` (`log_evalue`)
) ENGINE=InnoDB AUTO_INCREMENT=7888676 DEFAULT CHARSET=latin1;
Queries that delete large chunks from the table are simply like this:
从表中删除大块的查询就像这样:
DELETE FROM problematic_table WHERE problematic_table.taxid = '57';
A query like this just took almost an hour to finish. I can imagine that the index rewriting overhead makes these queries very slow.
像这样的查询只花了将近一个小时才完成。我可以想象索引重写开销使这些查询非常慢。
I am developing an application that will run on pre-existing databases. I most likely have no control over server variables unless I make changes to them mandatory (which I would prefer not to), so I'm afraid suggestions that change those are of little value.
我正在开发一个将在预先存在的数据库上运行的应用程序。我很可能无法控制服务器变量,除非我对它们进行强制性更改(我不想这样做),所以恐怕改变这些的建议价值不大。
I have tried to INSERT ... SELECT
those rows that I don't want to delete into a temporary table and just dropping the rest, but as the ratio of to-delete vs. to-keep shifts towards to-keep, this is no longer a useful solution.
我已经尝试将INSERT ... SELECT
那些我不想删除到临时表中的行删除,而只是删除其余的行,但是随着删除与保留的比率向保留转变,这不再是一个有用的解决方案.
This is a table that may see frequent INSERT
s and SELECT
s in the future, but no UPDATE
s. Basically, it's a logging and reference table that needs to drop parts of its content from time to time.
这个表以后可能会看到频繁的INSERT
s和SELECT
s,但是没有UPDATE
s。基本上,它是一个需要不时删除部分内容的日志记录和参考表。
Could I improve my indexes on this table by limiting their length? Would switching to MyISAM help, which supports DISABLE KEYS
during transactions? What else could I try to improve DELETE
performance?
我可以通过限制它们的长度来改进我在这个表上的索引吗?切换到 MyISAM 有帮助DISABLE KEYS
吗,它在交易期间支持?我还能尝试什么来提高DELETE
性能?
Edit:One such deletion would be in the order of about one million of rows.
编辑:一个这样的删除将是大约一百万行。
采纳答案by mpe
This solution can provide better performance once completed, but the process may take some time to implement.
此解决方案一旦完成即可提供更好的性能,但该过程可能需要一些时间来实施。
A new BIT
column can be added and defaulted to TRUE
for "active" and FALSE
for "inactive". If that's not enough states, you could use TINYINT
with 256 possible values.
BIT
可以添加新列并默认TRUE
为“活动”和FALSE
“非活动”。如果这还不够状态,您可以使用TINYINT
256 个可能的值。
Adding this new column will probably take a long time, but once it's over, your updates should be much faster as long as you do it off the PRIMARY
as you do with your deletes and don't index this new column.
添加这个新列可能需要很长时间,但是一旦它结束,您的更新应该会快得多,只要您PRIMARY
像删除时一样执行它并且不索引这个新列。
The reason why InnoDB takes so long to DELETE
on such a massive table as yours is because of the cluster index. It physically orders your table based upon your PRIMARY
, first UNIQUE
it finds, or whatever it can determine as an adequate substitute if it can't find PRIMARY
or UNIQUE
, so when one row is deleted, it now reorders your entire table physically on the disk for speed and defragmentation. So it's not the DELETE
that's taking so long; it's the physical reordering after that row is removed.
InnoDBDELETE
在像您这样庞大的表上花费这么长时间的原因是集群索引。它根据您的PRIMARY
, 首先UNIQUE
它找到的 , 或者它可以确定为适当替代品的任何内容对您的表进行物理排序,如果找不到PRIMARY
或UNIQUE
碎片整理。所以这不是DELETE
需要这么长时间;这是删除该行后的物理重新排序。
When you create a fixed width column and update that instead of deleting, there's no need for physical reordering across your huge table because the space consumed by a row and table itself is constant.
当您创建一个固定宽度的列并更新它而不是删除时,不需要在巨大的表中进行物理重新排序,因为行和表本身消耗的空间是恒定的。
During off hours, a single DELETE
can be used to remove the unnecessary rows. This operation will still be slow but collectively much faster than deleting individual rows.
在下班时间,DELETE
可以使用单个删除不必要的行。此操作仍然会很慢,但总体上比删除单个行要快得多。
回答by vdd
I had a similar scenario with a table with 2 million rows and a delete statement, which should delete around a 100 thousand rows - it took around 10 minutes to do so.
我有一个类似的场景,有一个包含 200 万行的表和一个删除语句,它应该删除大约 10 万行 - 大约需要 10 分钟。
After I checked the configuration, I found that MySQL Server was running with default innodb_buffer_pool_size
= 8 MB (!).
检查配置后,我发现 MySQL Server 以 default innodb_buffer_pool_size
= 8 MB (!) 运行。
After restart with innodb_buffer_pool_size
= 1.5GB, the same scenario took 10 sec.
使用innodb_buffer_pool_size
= 1.5GB重新启动后,同样的场景需要 10 秒。
So it looks like there is a dependency if "reordering of the table" can fit in buffer_pool or not.
因此,如果“表的重新排序”是否适合 buffer_pool,似乎存在依赖性。
回答by Jan Larsen
I solved a similar problem by using a stored procedure, thereby improving performance by a factor of several thousand.
我通过使用存储过程解决了类似的问题,从而将性能提高了数千倍。
My table had 33M rows and several indexes and I wanted to delete 10K rows. My DB was in Azure with no control over innodb_buffer_pool_size.
我的表有 33M 行和几个索引,我想删除 10K 行。我的数据库在 Azure 中,无法控制 innodb_buffer_pool_size。
For simplicity I created a table tmp_id
with only a primary id
field:
为简单起见,我创建了一个tmp_id
只有一个主要id
字段的表:
CREATE TABLE `tmp_id` (
`id` bigint(20) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
)
I selected the set of ids I wanted to delete into tmp_id
and ran delete from my_table where id in (select id from tmp_id);
This did not complete in 12 hours, so I tried with only a single id in tmp_id
and it took 25 minutes. Doing delete from my_table where id = 1234
completed in a few milliseconds, so I decided to try doing that in a procedure instead:
我选择了我想删除的一组 idtmp_id
并运行delete from my_table where id in (select id from tmp_id);
这并没有在 12 小时内完成,所以我只尝试了一个 id,tmp_id
花了 25 分钟。这样做delete from my_table where id = 1234
在几毫秒内完成,所以我决定尝试这样做,而不是在一个过程:
CREATE PROCEDURE `delete_ids_in_tmp`()
BEGIN
declare finished integer default 0;
declare v_id bigint(20);
declare cur1 cursor for select id from tmp_id;
declare continue handler for not found set finished=1;
open cur1;
igmLoop: loop
fetch cur1 into v_id;
if finished = 1 then leave igmLoop; end if;
delete from problematic_table where id = v_id;
end loop igmLoop;
close cur1;
END
Now call delete_ids_in_tmp();
deleted all 10K rows in less than a minute.
现在call delete_ids_in_tmp();
在不到一分钟的时间内删除了所有 10K 行。
回答by thephper
I have an InnoDB table with around 200 million rows and I did experience the same issue. Deleting rows took forever.
我有一个包含大约 2 亿行的 InnoDB 表,我确实遇到了同样的问题。删除行需要很长时间。
There are a primary key, a unique key and multiple compound indexes on the table.
表上有一个主键、一个唯一键和多个复合索引。
When deleting in smaller chunks it went pretty fast, so I decided to make a stored procedure that simply deleted the rows in multiple iterations with a limit. Kind of like Jan Larsen's answer, but with no need for separate table.
当以较小的块删除时,它进行得非常快,因此我决定创建一个存储过程,该过程可以在有限制的多次迭代中简单地删除行。有点像 Jan Larsen 的回答,但不需要单独的表格。
That made it possible to delete large chunks of data (around 500K rows) within a few minutes.
这使得在几分钟内删除大块数据(大约 50 万行)成为可能。
It seems like the transaction that InnoDB has to make to be able to rollback the changes on errors are too big, and therefor cannot fit into memory, which is causing the delete to perform very bad.
看起来InnoDB为了能够回滚错误更改而必须进行的事务太大,因此无法放入内存,这导致删除执行非常糟糕。
The procedure:
步骤:
CREATE DEFINER=`root`@`%` PROCEDURE `delete_rows`()
BEGIN
declare v_max int unsigned default 100;
declare v_counter int unsigned default 1;
while v_counter < v_max do
DELETE from items where a = 'A' AND b = 'B' AND c = 'C' LIMIT 10000;
set v_counter=v_counter+1;
end while;
END
Then call it by:
然后通过以下方式调用它:
CALL delete_rows();
The where sentence matches a compound index starting with a,b,c-columns, which I think is important, so that MySQL do not have to make a full table scan to match the rows.
where 语句匹配以 a,b,c-columns 开头的复合索引,我认为这很重要,这样 MySQL 就不必进行全表扫描来匹配行。
回答by Prabhat
DELETE FROM problematic_table WHERE problematic_table.taxid = '57';
remove Quotes, Since taxid is Integer and Passing value in quotes makes its string, due to compare between Integer and String its doesn't pick Index.
删除引号,由于出租车是整数,并且在引号中传递值使其成为字符串,由于整数和字符串之间的比较,它不选择索引。
DELETE FROM problematic_table WHERE problematic_table.taxid = 57;