MySQL 包含 8000 万条记录的表并添加索引需要超过 18 小时(或永远)!怎么办?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3695768/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Table with 80 million records and adding an index takes more than 18 hours (or forever)! Now what?
提问by Legend
A short recap of what happened. I am working with 71 million records (not much compared to billions of records processed by others). On a different thread, someone suggested that the current setup of my cluster is not suitable for my need. My table structure is:
简要回顾一下所发生的事情。我正在处理 7100 万条记录(与其他人处理的数十亿条记录相比并不多)。在另一个线程上,有人建议我的集群的当前设置不适合我的需要。我的表结构是:
CREATE TABLE `IPAddresses` (
`id` int(11) unsigned NOT NULL auto_increment,
`ipaddress` bigint(20) unsigned default NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM;
And I added the 71 million records and then did a:
我添加了 7100 万条记录,然后做了一个:
ALTER TABLE IPAddresses ADD INDEX(ipaddress);
It's been 14 hours and the operation is still not completed. Upon Googling, I found that there is a well-known approach for solving this problem - Partitioning. I understand that I need to partition my table now based on the ipaddress but can I do this without recreating the entire table? I mean, through an ALTER statement? If yes, there was one requirement saying that the column to be partitioned on should be a primary key. I will be using the id of this ipaddress in constructing a different table so ipaddress is not my primary key. How do I partition my table given this scenario?
已经14个小时了,手术还没有完成。通过谷歌搜索,我发现有一个众所周知的方法可以解决这个问题——分区。我知道我现在需要根据 ipaddress 对我的表进行分区,但是我可以在不重新创建整个表的情况下执行此操作吗?我的意思是,通过 ALTER 语句?如果是,则有一个要求说要分区的列应该是主键。我将使用这个 ipaddress 的 id 来构建一个不同的表,所以 ipaddress 不是我的主键。在这种情况下,如何对表进行分区?
回答by Legend
Ok turns out that this problem was more than just a simple create a table, index it and forget problem :) Here's what I did just in case someone else faces the same problem (I have used an example of IP Address but it works for other data types too):
好吧,事实证明这个问题不仅仅是一个简单的创建一个表,索引它然后忘记问题:) 这是我所做的,以防其他人面临同样的问题(我使用了一个 IP 地址的例子,但它适用于其他数据类型):
Problem: Your table has millions of entries and you need to add an index really fast
问题:您的表有数百万个条目,您需要非常快速地添加索引
Usecase:Consider storing millions of IP addresses in a lookup table. Adding the IP addresses should not be a big problem but creating an index on them takes more than 14 hours.
用例:考虑在查找表中存储数百万个 IP 地址。添加 IP 地址应该不是什么大问题,但为它们创建索引需要 14 多个小时。
Solution: Partition your table using MySQL's Partitioning strategy
解决方案:使用MySQL 的 Partitioning策略对表进行分区
Case #1: When the table you want is not yet created
案例#1:当您想要的表尚未创建时
CREATE TABLE IPADDRESSES(
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
ipaddress BIGINT UNSIGNED,
PRIMARY KEY(id, ipaddress)
) ENGINE=MYISAM
PARTITION BY HASH(ipaddress)
PARTITIONS 20;
Case #2: When the table you want is already created.There seems to be a way to use ALTER TABLE to do this but I have not yet figured out a proper solution for this. Instead, there is a slightly inefficient solution:
案例#2:当您想要的表已经创建时。似乎有一种方法可以使用 ALTER TABLE 来做到这一点,但我还没有找到合适的解决方案。相反,有一个效率稍低的解决方案:
CREATE TABLE IPADDRESSES_TEMP(
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
ipaddress BIGINT UNSIGNED,
PRIMARY KEY(id)
) ENGINE=MYISAM;
Insert your IP addresses into this table. And then create the actual table with partitions:
将您的 IP 地址插入此表中。然后创建带有分区的实际表:
CREATE TABLE IPADDRESSES(
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
ipaddress BIGINT UNSIGNED,
PRIMARY KEY(id, ipaddress)
) ENGINE=MYISAM
PARTITION BY HASH(ipaddress)
PARTITIONS 20;
And then finally
然后最后
INSERT INTO IPADDRESSES(ipaddress) SELECT ipaddress FROM IPADDRESSES_TEMP;
DROP TABLE IPADDRESSES_TEMP;
ALTER TABLE IPADDRESSES ADD INDEX(ipaddress)
And there you go... indexing on the new table took me about 2 hours on a 3.2GHz machine with 1GB RAM :) Hope this helps.
你去吧......在新表上建立索引花了我大约 2 小时在 1GB RAM 的 3.2GHz 机器上:) 希望这会有所帮助。
回答by peufeu
Creating indexes with MySQL is slow, but not that slow. With 71 million records, it should take a couple minutes, not 14 hours. Possible problems are :
用 MySQL 创建索引很慢,但没那么慢。有 7100 万条记录,应该需要几分钟,而不是 14 小时。可能的问题是:
- you have not configured sort buffer sizes and other configuration options
- 您尚未配置排序缓冲区大小和其他配置选项
look here : http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_myisam_sort_buffer_size
看这里:http: //dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_myisam_sort_buffer_size
If you try to generate a 1GB index with a 8MB sort buffer it's going to take lots of passes. But if the buffer is larger than your CPU cache it will get slower. So you have to test and see what works best.
如果您尝试使用 8MB 排序缓冲区生成 1GB 索引,则需要多次传递。但是如果缓冲区大于你的 CPU 缓存,它会变慢。因此,您必须进行测试,看看什么最有效。
- someone has a lock on the table
- your IO system sucks
- your server is swapping
- etc
- 有人锁在桌子上
- 你的 IO 系统很烂
- 你的服务器正在交换
- 等等
as usual check iostat, vmstat, logs, etc. Issue a LOCK TABLE on your table to check if someone has a lock on it.
像往常一样检查 iostat、vmstat、日志等。在您的表上发出 LOCK TABLE 以检查是否有人对其进行了锁定。
FYI on my 64-bit desktop creating an index on 10M random BIGINTs takes 17s...
仅供参考,在我的 64 位桌面上创建 10M 随机 BIGINT 的索引需要 17 秒...
回答by Giel Berkers
I had the problem where I wanted to speed up my query by adding an index. The table only had about 300.000 records but it also took way too long. When I checked the mysql server processes, it turned out that the query I was trying to optimize was still running in the background. 4 times! After I killed those queries, indexing was done in a jiffy. Perhaps the same problem applies to your situation.
我遇到了一个问题,我想通过添加索引来加快查询速度。该表只有大约 300.000 条记录,但花费的时间也太长了。当我检查 mysql 服务器进程时,结果发现我试图优化的查询仍在后台运行。4次!在我杀死这些查询之后,索引很快就完成了。也许同样的问题适用于您的情况。
回答by Michael Eakins
You are using MyISAM which is being deprecated soon. An alternative would be InnoDB.
您正在使用即将弃用的 MyISAM。另一种选择是 InnoDB。
"InnoDB is a transaction-safe (ACID compliant) storage engine for MySQL that has commit, rollback, and crash-recovery capabilities to protect user data. InnoDB row-level locking (without escalation to coarser granularity locks) and Oracle-style consistent nonlocking reads increase multi-user concurrency and performance. InnoDB stores user data in clustered indexes to reduce I/O for common queries based on primary keys. To maintain data integrity, InnoDB also supports FOREIGN KEY referential-integrity constraints. You can freely mix InnoDB tables with tables from other MySQL storage engines, even within the same statement."\
“InnoDB 是 MySQL 的事务安全(ACID 兼容)存储引擎,具有提交、回滚和崩溃恢复功能来保护用户数据。InnoDB 行级锁定(不升级到更粗粒度的锁定)和 Oracle 风格的一致非锁定读取增加多用户并发和性能 InnoDB 将用户数据存储在聚簇索引中,以减少基于主键的常见查询的 I/O。为了保持数据完整性,InnoDB 还支持 FOREIGN KEY 引用完整性约束。您可以自由混合 InnoDB 表来自其他 MySQL 存储引擎的表,即使在同一个语句中。"\
http://dev.mysql.com/doc/refman/5.0/en/innodb.html
http://dev.mysql.com/doc/refman/5.0/en/innodb.html
According to:
根据:
http://dev.mysql.com/tech-resources/articles/storage-engine/part_1.html
http://dev.mysql.com/tech-resources/articles/storage-engine/part_1.html
, you should be able to switch between different engine by utilizing a simple alter command which allows you some flexibility. It also states that each table in your DB can be configured independently.
,您应该能够通过使用一个简单的 alter 命令在不同的引擎之间切换,这为您提供了一定的灵活性。它还指出您的数据库中的每个表都可以独立配置。
回答by seema
In your table . you have already inserted 71 billion records. now if you want to create partitions on the primary key column of your table, you can use alter table option. An example is given for your reference.
在你的桌子上。您已经插入了 710 亿条记录。现在,如果要在表的主键列上创建分区,可以使用 alter table 选项。给出了一个例子供您参考。
CREATE TABLE t1 (
id INT,
year_col INT
);
ALTER TABLE t1
PARTITION BY HASH(id)
PARTITIONS 8;