如何在 LIMIT 子句中使用大偏移量加速 MySQL 查询?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1243952/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I speed up a MySQL query with a large offset in the LIMIT clause?
提问by ZA.
I'm getting performance problems when LIMIT
ing a mysql SELECT
with a large offset:
使用大偏移量LIMIT
mysql时出现性能问题SELECT
:
SELECT * FROM table LIMIT m, n;
If the offset m
is, say, larger than 1,000,000, the operation is very slow.
如果偏移m
量大于 1,000,000,则操作非常慢。
I do have to use limit m, n
; I can't use something like id > 1,000,000 limit n
.
我必须使用limit m, n
;我不能使用类似的东西id > 1,000,000 limit n
。
How can I optimize this statement for better performance?
如何优化此语句以获得更好的性能?
采纳答案by Paul Dixon
Perhaps you could create an indexing table which provides a sequential key relating to the key in your target table. Then you can join this indexing table to your target table and use a where clause to more efficiently get the rows you want.
也许您可以创建一个索引表,它提供与目标表中的键相关的顺序键。然后,您可以将此索引表连接到目标表,并使用 where 子句更有效地获取所需的行。
#create table to store sequences
CREATE TABLE seq (
seq_no int not null auto_increment,
id int not null,
primary key(seq_no),
unique(id)
);
#create the sequence
TRUNCATE seq;
INSERT INTO seq (id) SELECT id FROM mytable ORDER BY id;
#now get 1000 rows from offset 1000000
SELECT mytable.*
FROM mytable
INNER JOIN seq USING(id)
WHERE seq.seq_no BETWEEN 1000000 AND 1000999;
回答by bart
There's a blog post somewhere on the internet on how you should best make the selection of the rowsto show should be as compact as possible, thus: just the ids; and producing the complete results should in turn fetch all the data you want for only the rows you selected.
互联网上某处有一篇博客文章,介绍了如何最好地选择要显示的行应该尽可能紧凑,因此:仅 id;并生成完整的结果应该反过来只为您选择的行获取您想要的所有数据。
Thus, the SQL might be something like (untested, I'm not sure it actually will do any good):
因此,SQL 可能类似于(未经测试,我不确定它实际上是否会有任何好处):
select A.* from table A
inner join (select id from table order by whatever limit m, n) B
on A.id = B.id
order by A.whatever
If your SQL engine is too primitive to allow this kind of SQL statements, or it doesn't improve anything, against hope, it might be worthwhile to break this single statement into multiple statements and capture the ids into a data structure.
如果您的 SQL 引擎太原始而不允许使用这种 SQL 语句,或者它没有改进任何东西,但希望如此,可能值得将这个单个语句分解为多个语句并将 id 捕获到数据结构中。
Update: I found the blog post I was talking about: it was Jeff Atwood's "All Abstractions Are Failed Abstractions"on Coding Horror.
更新:我找到了我正在谈论的博客文章:这是 Jeff Atwood关于编码恐怖的“所有抽象都是失败的抽象”。
回答by Scott Nelson
If records are large, the slowness may be coming from loading the data. If the id column is indexed, then just selecting it will be much faster. You can then do a second query with an IN clause for the appropriate ids (or could formulate a WHERE clause using the min and max ids from the first query.)
如果记录很大,缓慢可能来自加载数据。如果 id 列被索引,那么只需选择它就会快得多。然后,您可以使用 IN 子句对适当的 id 进行第二次查询(或者可以使用第一个查询中的 min 和 max id 来制定 WHERE 子句。)
slow:
减缓:
SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
fast:
快速地:
SELECT id FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
SELECT * FROM table WHERE id IN (1,2,3...10)
回答by SlappyTheFish
I don't think there's any need to create a separate index if your table already has one. If so, then you can order by this primary key and then use values of the key to step through:
如果您的表已经有索引,我认为没有必要创建单独的索引。如果是这样,那么您可以按此主键排序,然后使用该键的值逐步执行:
SELECT * FROM myBigTable WHERE id > :OFFSET ORDER BY id ASC;
Another optimisation would be not to use SELECT * but just the ID so that it can simply read the index and doesn't have to then locate all the data (reduce IO overhead). If you need some of the other columns then perhaps you could add these to the index so that they are read with the primary key (which will most likely be held in memory and therefore not require a disc lookup) - although this will not be appropriate for all cases so you will have to have a play.
另一个优化是不使用 SELECT * 而只使用 ID,这样它就可以简单地读取索引,而不必再定位所有数据(减少 IO 开销)。如果您需要其他一些列,那么也许您可以将它们添加到索引中,以便使用主键读取它们(这很可能保存在内存中,因此不需要磁盘查找)-尽管这不合适对于所有情况,您都必须玩游戏。
I wrote an article with more details:
我写了一篇更详细的文章:
http://www.4pmp.com/2010/02/scalable-mysql-avoid-offset-for-large-tables/
http://www.4pmp.com/2010/02/scalable-mysql-avoid-offset-for-large-tables/
回答by Hymanson Leung
Paul Dixon's answer is indeed a solution to the problem, but you'll have to maintain the sequence table and ensure that there is no row gaps.
Paul Dixon 的回答确实是解决问题的方法,但是您必须维护序列表并确保没有行间隙。
If that's feasible, a better solution would be to simply ensure that the original table has no row gaps, and starts from id 1. Then grab the rows using the id for pagination.
如果这是可行的,更好的解决方案是简单地确保原始表没有行间隙,并从 id 1 开始。然后使用 id 获取行进行分页。
SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;
SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;
and so on...
等等...
回答by PhPGuy
I have run into this problem recently. The problem was two parts to fix. First I had to use an inner select in my FROM clause that did my limiting and offsetting for me on the primary key only:
我最近遇到了这个问题。问题是要解决的两个部分。首先,我必须在我的 FROM 子句中使用一个内部选择,它只对主键进行限制和偏移:
$subQuery = DB::raw("( SELECT id FROM titles WHERE id BETWEEN {$startId} AND {$endId} ORDER BY title ) as t");
Then I could use that as the from part of my query:
然后我可以将其用作查询的 from 部分:
'titles.id',
'title_eisbns_concat.eisbns_concat',
'titles.pub_symbol',
'titles.title',
'titles.subtitle',
'titles.contributor1',
'titles.publisher',
'titles.epub_date',
'titles.ebook_price',
'publisher_licenses.id as pub_license_id',
'license_types.shortname',
$coversQuery
)
->from($subQuery)
->leftJoin('titles', 't.id', '=', 'titles.id')
->leftJoin('organizations', 'organizations.symbol', '=', 'titles.pub_symbol')
->leftJoin('title_eisbns_concat', 'titles.id', '=', 'title_eisbns_concat.title_id')
->leftJoin('publisher_licenses', 'publisher_licenses.org_id', '=', 'organizations.id')
->leftJoin('license_types', 'license_types.id', '=', 'publisher_licenses.license_type_id')
The first time I created this query I had used the OFFSET and LIMIT in MySql. This worked fine until I got past page 100 then the offset started getting unbearably slow. Changing that to BETWEEN in my inner query sped it up for any page. I'm not sure why MySql hasn't sped up OFFSET but between seems to reel it back in.
我第一次创建这个查询时,我在 MySql 中使用了 OFFSET 和 LIMIT。这工作正常,直到我超过第 100 页,然后偏移量开始变得难以忍受。在我的内部查询中将其更改为 BETWEEN 可以加快任何页面的速度。我不确定为什么 MySql 没有加快 OFFSET 的速度,但在两者之间似乎又卷土重来。