加速 MySQL 中的行计数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1332624/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Speeding up row counting in MySQL
提问by Kevin Ivarsen
Suppose, for illustrative purposes, you are running a library using a simple MySQL "books" table with three columns:
假设,出于说明目的,您正在使用一个包含三列的简单 MySQL “books” 表运行一个库:
(id, title, status)
(ID,标题,状态)
- idis the primary key
- titleis the title of the book
- statuscould be an enum describing the book's current state (e.g. AVAILABLE, CHECKEDOUT, PROCESSING, MISSING)
- id是主键
- 书名是书名
- status可以是描述图书当前状态的枚举(例如 AVAILABLE、CHECKEDOUT、PROCESSING、MISSING)
A simple query to report how many books fall into each state is:
报告每个状态有多少本书的简单查询是:
SELECT status, COUNT(*) FROM books GROUP BY status
or to specifically find how many books are available:
或专门查找有多少本书可用:
SELECT COUNT(*) FROM books WHERE status = "AVAILABLE"
However, once the table grows to millions of rows, these queries take several seconds to complete. Adding an index to the "status" column doesn't appear to make a difference in my experience.
但是,一旦表增长到数百万行,这些查询需要几秒钟才能完成。向“状态”列添加索引似乎对我的体验没有影响。
Aside from periodically caching the results or explicitly updating summary info in a separate table each time a book changes state (via triggers or some other mechanism), are there any techniques for speeding up these kinds of queries? It seems that the COUNT queries end up looking at every row, and (without knowing more details) I'm a bit surprised that this information can't somehow be determined from the index.
除了在每次图书更改状态(通过触发器或其他机制)时定期缓存结果或在单独的表中显式更新摘要信息之外,是否有任何技术可以加快此类查询的速度?似乎 COUNT 查询最终会查看每一行,并且(在不知道更多细节的情况下)我有点惊讶无法从索引中以某种方式确定这些信息。
UPDATE
更新
Using the sample table (with an indexed "status" column) with 2 million rows, I benchmarked the GROUP BY query. Using the InnoDB storage engine, the query takes 3.0 - 3.2 seconds on my machine. Using MyISAM, the query takes 0.9 - 1.1 seconds. There was no significant difference between count(*), count(status), or count(1) in either case.
使用包含 200 万行的示例表(带有索引的“状态”列),我对 GROUP BY 查询进行了基准测试。使用 InnoDB 存储引擎,查询在我的机器上需要 3.0 - 3.2 秒。使用 MyISAM,查询需要 0.9 - 1.1 秒。在这两种情况下,count(*)、count(status) 或 count(1) 之间没有显着差异。
MyISAM is admittedly a bit faster, but I was curious to see if there was a way to make an equivalent query run muchfaster (e.g. 10-50 ms -- fast enough to be called on every webpage request for a low-traffic site) without the mental overhead of caching and triggers. It sounds like the answer is "there's no way to run the direct query quickly" which is what I expected - I just wanted to make sure I wasn't missing an easy alternative.
诚然,MyISAM 的速度要快一些,但我很想知道是否有办法使等效查询运行得更快(例如 10-50 毫秒——足够快,可以在低流量站点的每个网页请求上调用)没有缓存和触发器的心理开销。听起来答案是“没有办法快速运行直接查询”,这正是我所期望的——我只是想确保我没有错过一个简单的替代方案。
采纳答案by Josh Davis
So the question is
所以问题是
are there any techniques for speeding up these kinds of queries?
是否有任何技术可以加快此类查询的速度?
Well, not really. A column-based storage engine would probably be faster with those SELECT COUNT(*) queries but it would be less performant for pretty much any other query.
嗯,不是真的。对于那些 SELECT COUNT(*) 查询,基于列的存储引擎可能会更快,但对于几乎任何其他查询,它的性能都会降低。
Your best bet is to maintain a summary table via triggers. It doesn't have much overhead and the SELECT part will be instantaneous no matter how big the table. Here's some boilerplate code:
最好的办法是通过触发器维护汇总表。它没有太多开销,无论表有多大,SELECT 部分都将是即时的。这是一些样板代码:
DELIMITER //
CREATE TRIGGER ai_books AFTER INSERT ON books
FOR EACH ROW UPDATE books_cnt SET total = total + 1 WHERE status = NEW.status
//
CREATE TRIGGER ad_books AFTER DELETE ON books
FOR EACH ROW UPDATE books_cnt SET total = total - 1 WHERE status = OLD.status;
//
CREATE TRIGGER au_books AFTER UPDATE ON books
FOR EACH ROW
BEGIN
IF (OLD.status <> NEW.status)
THEN
UPDATE books_cnt SET total = total + IF(status = NEW.status, 1, -1) WHERE status IN (OLD.status, NEW.status);
END IF;
END
//
回答by Sam Saffron
MyISAM is actually pretty fast with count(*) the downside is that the MyISAM storage is not that reliable and best avoided where data integrity is critical.
MyISAM 实际上与 count(*) 相当快,缺点是 MyISAM 存储不是那么可靠,最好避免在数据完整性至关重要的情况下。
InnoDB can be very slow to perform count(*) type queries, cause it is designed to allow for multiple concurrent views of the same data. So at any point in time, its not enough to go to the index to get the count.
InnoDB 执行 count(*) 类型查询的速度可能非常慢,因为它被设计为允许对同一数据进行多个并发视图。所以在任何时间点,仅仅去索引获取计数是不够的。
From: http://www.mail-archive.com/[email protected]/msg120320.html
来自:http: //www.mail-archive.com/[email protected]/msg120320.html
The database starts with 1000 records in it I start a transaction You start a transaction I delete 50 records You add 50 records I do a COUNT() and see 950 records. You do a COUNT() and see 1050 records. I commit my transaction - database now has 950 records to everyone but you. You commit your transaction - database has 1000 records again.
How InnoDB keeps up with which records are "visible" or "modifiable" with respect to any transaction is through row-level locking, transaction isolation levels, and multi-versioning. http://dev.mysql.com/doc/refman/4.1/en/innodb-transaction-model.htmlhttp://dev.mysql.com/doc/refman/4.1/en/innodb-multi-versioning.html
That is what makes counting how many records each person can see is not so straight-forward.
数据库开始时有 1000 条记录 我开始一个事务 你开始一个事务 我删除 50 条记录 你添加 50 条记录 我做了一个 COUNT( ) 并看到 950 条记录。您执行 COUNT() 并查看 1050 条记录。我提交了我的交易 - 数据库现在有 950 条记录给除你之外的所有人。您提交您的事务 - 数据库再次有 1000 条记录。
InnoDB 如何跟上任何事务的哪些记录是“可见的”或“可修改的”是通过行级锁定、事务隔离级别和多版本控制。 http://dev.mysql.com/doc/refman/4.1/en/innodb-transaction-model.html http://dev.mysql.com/doc/refman/4.1/en/innodb-multi-versioning.html
这就是为什么计算每个人可以看到的记录数量不是那么简单的原因。
So bottom line is you will need to look at caching the counts somehow as opposed to going to the table if you need to get at this information frequently and fast.
所以最重要的是,如果您需要频繁和快速地获取这些信息,您将需要以某种方式缓存计数,而不是去表中。
回答by Alterlife
from: http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html
来自:http: //dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html
InnoDB does not keep an internal count of rows in a table. (In practice, this would be somewhat complicated due to multi-versioning.) To process a SELECT COUNT(*) FROM t statement, InnoDB must scan an index of the table, which takes some time if the index is not entirely in the buffer pool.
InnoDB 不保留表中行的内部计数。(实际上,由于多版本,这会有些复杂。)为了处理 SELECT COUNT(*) FROM t 语句,InnoDB 必须扫描表的索引,如果索引不完全在缓冲区中,这需要一些时间水池。
The solution suggested is:
建议的解决方案是:
To get a fast count, you have to use a counter table you create yourself and let your application update it according to the inserts and deletes it does. SHOW TABLE STATUS also can be used if an approximate row count is sufficient.
要获得快速计数,您必须使用您自己创建的计数器表,并让您的应用程序根据它所做的插入和删除来更新它。如果近似行数足够,也可以使用 SHOW TABLE STATUS。
In short: count(*) (on innoDB) will take a long time time for tables containing a large number of rows. This is by design and can't be helped.
简而言之:对于包含大量行的表,count(*)(在 innoDB 上)将花费很长时间。这是设计使然,无法提供帮助。
Write your own workaround.
编写您自己的解决方法。
回答by peufeu
There was no significant difference between count(*), count(status), or count(1)
count(*)、count(status) 或 count(1) 之间没有显着差异
count(column) returns the number of lines where column is NOT NULL. Since 1 is NOT NULL, and status is also, presumably, NOT NULL, the database will optimize away the test and convert them all to count(*). Which, ironically, does not mean "count lines where all columns are not null" (or any other combination), it just means "count lines"...
count(column) 返回列不是 NULL 的行数。由于 1 不是 NULL,并且 status 大概也是 NOT NULL,数据库将优化测试并将它们全部转换为 count(*)。具有讽刺意味的是,这并不意味着“计算所有列都不为空的行”(或任何其他组合),它只是意味着“计数行”......
Now, back to your question, you can't have your cake and eat it...
现在,回到你的问题,你不能吃你的蛋糕...
If you want an "exact" count to be available at all times, then you have to increment and decrement in real time, via triggers, which slows down your writes
Or you can use count(*), but this will be slow
Or you can settle for a rough estimate, or an out-of-date value, and use caching or other probabilistic approaches.
如果您希望“精确”计数始终可用,那么您必须通过触发器实时增加和减少,这会减慢您的写入速度
或者你可以使用count(*),但这会很慢
或者您可以接受粗略估计或过时的值,并使用缓存或其他概率方法。
Generally, on values above about "a few", NO-ONE is interested in an exact real-time count. It is a red herring anyway, as by the time you read it, the value will most likely have changed.
通常,对于高于“几个”的值,NO-ONE 对精确的实时计数感兴趣。无论如何,它是一个红鲱鱼,因为当您阅读它时,该值很可能已经改变了。
回答by Witt
Many answers here said an index wouldn't help but in my case it did...
这里的许多答案都说索引无济于事,但在我的情况下它确实......
My table used MyISAM, and had only about 100k rows. The query:
我的表使用了 MyISAM,并且只有大约 10 万行。查询:
select count(*) from mytable where foreign_key_id=n
took 7-8 seconds to complete.
用了 7-8 秒完成。
I added an index on the foreign_key_id
:
我在 上添加了一个索引foreign_key_id
:
create index myindex on mytable (foreign_key_id) using btree;
After creating the index, the select statement above reported an execution time of 0.00 seconds.
创建索引后,上面的select语句报告执行时间为0.00秒。