MySQL "SELECT COUNT(*)" 很慢,即使有 where 子句
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/511820/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
"SELECT COUNT(*)" is slow, even with where clause
提问by Ovid
I'm trying to figure out how to optimize a very slow query in MySQL (I didn't design this):
我想弄清楚如何在 MySQL 中优化一个非常慢的查询(我没有设计这个):
SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391';
+----------+
| COUNT(*) |
+----------+
| 3224022 |
+----------+
1 row in set (1 min 0.16 sec)
Comparing that to a full count:
将其与完整计数进行比较:
select count(*) from change_event;
+----------+
| count(*) |
+----------+
| 6069102 |
+----------+
1 row in set (4.21 sec)
The explain statement doesn't help me here:
解释语句在这里对我没有帮助:
explain SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: me
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 8
ref: NULL
rows: 4120213
Extra: Using where; Using index
1 row in set (0.00 sec)
OK, it still thinks it needs roughly 4 million entries to count, but I could count lines in a file faster than that! I don't understand why MySQL is taking this long.
好吧,它仍然认为它需要大约 400 万个条目来计数,但我可以比这更快地计算文件中的行数!我不明白为什么 MySQL 需要这么长时间。
Here's the table definition:
这是表定义:
CREATE TABLE `change_event` (
`change_event_id` bigint(20) NOT NULL default '0',
`timestamp` datetime NOT NULL,
`change_type` enum('create','update','delete','noop') default NULL,
`changed_object_type` enum('Brand','Broadcast','Episode','OnDemand') NOT NULL,
`changed_object_id` varchar(255) default NULL,
`changed_object_modified` datetime NOT NULL default '1000-01-01 00:00:00',
`modified` datetime NOT NULL default '1000-01-01 00:00:00',
`created` datetime NOT NULL default '1000-01-01 00:00:00',
`pid` char(15) default NULL,
`episode_pid` char(15) default NULL,
`import_id` int(11) NOT NULL,
`status` enum('success','failure') NOT NULL,
`xml_diff` text,
`node_digest` char(32) default NULL,
PRIMARY KEY (`change_event_id`),
KEY `idx_change_events_changed_object_id` (`changed_object_id`),
KEY `idx_change_events_episode_pid` (`episode_pid`),
KEY `fk_import_id` (`import_id`),
KEY `idx_change_event_timestamp_ce_id` (`timestamp`,`change_event_id`),
KEY `idx_change_event_status` (`status`),
CONSTRAINT `fk_change_event_import` FOREIGN KEY (`import_id`) REFERENCES `import` (`import_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Version:
版本:
$ mysql --version
mysql Ver 14.12 Distrib 5.0.37, for pc-solaris2.8 (i386) using readline 5.0
Is there something obvious I'm missing? (Yes, I've already tried "SELECT COUNT(change_event_id)", but there's no performance difference).
有什么明显的我遗漏了吗?(是的,我已经尝试过“SELECT COUNT(change_event_id)”,但没有性能差异)。
回答by ???u
InnoDB uses clustered primary keys, so the primary key is stored along with the row in the data pages, not in separate index pages. In order to do a range scan you still have to scan through all of the potentially wide rows in data pages; note that this table contains a TEXT column.
InnoDB 使用集群主键,因此主键与行一起存储在数据页中,而不是单独的索引页中。为了进行范围扫描,您仍然必须扫描数据页中所有可能宽的行;请注意,此表包含一个 TEXT 列。
Two things I would try:
我会尝试两件事:
- run
optimize table
. This will ensure that the data pages are physically stored in sorted order. This could conceivably speed up a range scan on a clustered primary key. - create an additional non-primary index on just the change_event_id column. This will store a copy of that column in index pages which be much faster to scan. After creating it, check the explain plan to make sure it's using the new index.
- 跑
optimize table
。这将确保数据页按排序顺序物理存储。可以想象,这可以加快对集群主键的范围扫描。 - 仅在 change_event_id 列上创建一个额外的非主索引。这将在索引页中存储该列的副本,这样可以更快地扫描。创建后,检查解释计划以确保它使用新索引。
(you also probably want to make the change_event_id column bigint unsignedif it's incrementing from zero)
(如果 change_event_id 列是从零开始递增的,您可能还想让它成为无符号列)
回答by benjismith
Here are a few things I suggest:
以下是我建议的几点:
Change the column from a "bigint" to an "int unsigned". Do you really ever expect to have more than 4.2 billion records in this table? If not, then you're wasting space (and time) the the extra-wide field. MySQL indexes are more efficient on smaller data types.
Run the "OPTIMIZE TABLE" command, and see whether your query is any faster afterward.
You might also consider partitioning your tableaccording to the ID field, especially if older records (with lower ID values) become less relevant over time. A partitioned table can often execute aggregate queries faster than one huge, unpartitioned table.
将列从“bigint”更改为“int unsigned”。您真的希望这张表中有超过 42 亿条记录吗?如果没有,那么您就是在超宽领域浪费空间(和时间)。MySQL 索引在较小的数据类型上更有效。
运行“ OPTIMIZE TABLE”命令,然后查看您的查询是否更快。
您还可以考虑根据 ID 字段对表进行分区,尤其是在较旧的记录(具有较低 ID 值)随着时间的推移变得不那么相关时。分区表通常可以比一个巨大的未分区表更快地执行聚合查询。
EDIT:
编辑:
Looking more closely at this table, it looks like a logging-style table, where rows are inserted but never modified.
更仔细地观察这个表,它看起来像一个日志风格的表,其中插入了行但从未修改过。
If that's true, then you might not need all the transactional safety provided by the InnoDB storage engine, and you might be able to get away with switching to MyISAM, which is considerably more efficient on aggregate queries.
如果这是真的,那么您可能不需要 InnoDB 存储引擎提供的所有事务安全性,并且您可能能够摆脱切换到 MyISAM,这在聚合查询上效率更高。
回答by chaos
I've run into behavior like this before with IP geolocation databases. Past some number of records, MySQL's ability to get any advantage from indexes for range-based queries apparently evaporates. With the geolocation DBs, we handled it by segmenting the data into chunks that were reasonable enough to allow the indexes to be used.
我以前在 IP 地理定位数据库中遇到过这样的行为。经过一定数量的记录,MySQL 从基于范围查询的索引中获得任何优势的能力显然消失了。使用地理定位数据库,我们通过将数据分割成足够合理的块来处理它,以允许使用索引。
回答by Random Developer
Check to see how fragmented your indexes are. At my company we have a nightly import process that trashes our indexes and over time it can have a profound impact on data access speeds. For example we had a SQL procedure that took 2 hours to run one day after de-fragmenting the indexes it took 3 minutes. we use SQL Server 2005 ill look for a script that can check this on MySQL.
检查您的索引的碎片化程度。在我的公司,我们有一个夜间导入过程,它会破坏我们的索引,随着时间的推移,它会对数据访问速度产生深远的影响。例如,我们有一个 SQL 过程,在对索引进行碎片整理后,一天需要 2 小时才能运行,这需要 3 分钟。我们使用 SQL Server 2005 寻找一个可以在 MySQL 上检查这个的脚本。
Update: Check out this link: http://dev.mysql.com/doc/refman/5.0/en/innodb-file-defragmenting.html
更新:查看此链接:http: //dev.mysql.com/doc/refman/5.0/en/innodb-file-defragmenting.html
回答by Alnitak
Run "analyze table_name
" on that table - it's possible that the indices are no longer optimal.
analyze table_name
在该表上运行“ ” - 索引可能不再是最佳的。
You can often tell this by running "show index from table_name
". If the cardinality value is NULL
then you need to force re-analysis.
您通常可以通过运行“ show index from table_name
”来说明这一点。如果基数值是,NULL
那么您需要强制重新分析。
回答by Sergiy Tytarenko
MySQL does say "Using where" first, since it does need to read all records/values from the index data to actually count them. With InnoDb it also tries to "grab" that 4 mil record range to count it.
MySQL首先会说“使用位置”,因为它确实需要从索引数据中读取所有记录/值来实际计算它们。使用 InnoDb,它还尝试“抓取”400 万个记录范围来计算它。
You may need to experiment with different transaction isolation levels: http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html#isolevel_read-uncommitted
您可能需要尝试不同的事务隔离级别:http: //dev.mysql.com/doc/refman/5.1/en/set-transaction.html#isolevel_read-uncommitted
and see which one is better.
看看哪个更好。
With MyISAM it would be just fast, but with intensive write model will result in lock issues.
使用 MyISAM 会很快,但使用密集写入模型会导致锁定问题。
回答by Armando Cordova
To make the search more efficient, although I recommend adding index. I leave the command for you to try the metrics again
为了使搜索更有效,尽管我建议添加 index.html 。我把命令留给你再次尝试指标
CREATE INDEX ixid_1 ON change_event (change_event_id);
and repeat query
并重复查询
SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391';
-JACR
-JACR
回答by knoopx
I would create a "counters" table and add "create row"/"delete row" triggers to the table you are counting. The triggers should increase/decrease count values on "counters" table on every insert/delete, so you won't need to compute them every time you need them.
我会创建一个“计数器”表并将“创建行”/“删除行”触发器添加到您正在计算的表中。触发器应该在每次插入/删除时增加/减少“计数器”表上的计数值,因此您不需要每次需要它们时都计算它们。
You can also accomplish this on the application side by caching the counters but this will involve clearing the "counter cache" on every insertion/deletion.
您也可以通过缓存计数器在应用程序端完成此操作,但这将涉及在每次插入/删除时清除“计数器缓存”。
For some reference take a look at this http://pure.rednoize.com/2007/04/03/mysql-performance-use-counter-tables/
一些参考看看这个http://pure.rednoize.com/2007/04/03/mysql-performance-use-counter-tables/