为什么 MYSQL 较高的 LIMIT 偏移量会减慢查询速度?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4481388/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does MYSQL higher LIMIT offset slow the query down?
提问by Rahman
Scenario in short: A table with more than 16 million records [2GB in size]. The higher LIMIT offset with SELECT, the slower the query becomes, when using ORDER BY *primary_key*
简而言之场景:一个包含超过 1600 万条记录 [2GB 大小] 的表。当使用 ORDER BY *primary_key* 时,SELECT 的 LIMIT 偏移量越大,查询变得越慢
So
所以
SELECT * FROM large ORDER BY `id` LIMIT 0, 30
takes far less than
远远少于
SELECT * FROM large ORDER BY `id` LIMIT 10000, 30
That only orders 30 records and same eitherway. So it's not the overhead from ORDER BY.
Now when fetching the latest 30 rows it takes around 180 seconds. How can I optimize that simple query?
那只订购了 30 条记录,无论如何都是一样的。所以这不是来自 ORDER BY 的开销。
现在获取最新的 30 行大约需要 180 秒。如何优化这个简单的查询?
回答by Nikos Kyr
I had the exact same problem myself. Given the fact that you want to collect a large amount of this data and not a specific set of 30 you'll be probably running a loop and incrementing the offset by 30.
我自己也遇到了完全相同的问题。鉴于您想要收集大量此类数据而不是一组特定的 30,您可能会运行一个循环并将偏移量增加 30。
So what you can do instead is:
所以你可以做的是:
- Hold the last id of a set of data(30) (e.g. lastId = 530)
- Add the condition
WHERE id > lastId limit 0,30
- 保存一组数据的最后一个 id(30)(例如 lastId = 530)
- 添加条件
WHERE id > lastId limit 0,30
So you can always have a ZERO offset. You will be amazed by the performance improvement.
所以你总是可以有一个零偏移。你会惊讶于性能的提升。
回答by Quassnoi
It's normal that higher offsets slow the query down, since the query needs to count off the first OFFSET + LIMIT
records (and take only LIMIT
of them). The higher is this value, the longer the query runs.
较高的偏移量会减慢查询速度是正常的,因为查询需要对第一OFFSET + LIMIT
条记录进行计数(并且只LIMIT
取它们)。该值越高,查询运行的时间越长。
The query cannot go right to OFFSET
because, first, the records can be of different length, and, second, there can be gaps from deleted records. It needs to check and count each record on its way.
查询不能正确执行,OFFSET
因为首先,记录的长度可能不同,其次,删除的记录可能存在间隙。它需要检查和计算途中的每条记录。
Assuming that id
is a PRIMARY KEY
of a MyISAM
table, you can speed it up by using this trick:
假设id
是PRIMARY KEY
一个的MyISAM
表,你可以使用这一招加快步伐:
SELECT t.*
FROM (
SELECT id
FROM mytable
ORDER BY
id
LIMIT 10000, 30
) q
JOIN mytable t
ON t.id = q.id
See this article:
看这篇文章:
回答by Riedsio
MySQL cannot go directly to the 10000th record (or the 80000th byte as your suggesting) because it cannot assume that it's packed/ordered like that (or that it has continuous values in 1 to 10000). Although it might be that way in actuality, MySQL cannot assume that there are no holes/gaps/deleted ids.
MySQL 不能直接转到第 10000 条记录(或您建议的第 80000 字节),因为它不能假设它是这样打包/排序的(或者它在 1 到 10000 之间具有连续值)。尽管实际上可能是这样,但 MySQL 不能假设没有漏洞/间隙/删除的 id。
So, as bobs noted, MySQL will have to fetch 10000 rows (or traverse through 10000th entries of the index on id
) before finding the 30 to return.
因此,正如 bob 所指出的,id
在找到要返回的 30 行之前,MySQL 必须获取 10000 行(或遍历索引的第 10000 个条目)。
EDIT: To illustrate my point
编辑:为了说明我的观点
Note that although
请注意,虽然
SELECT * FROM large ORDER BY id LIMIT 10000, 30
would be slow(er),
会很慢(呃),
SELECT * FROM large WHERE id > 10000 ORDER BY id LIMIT 30
would be fast(er), and would return the same results provided that there are no missing id
s (i.e. gaps).
将是fast(er),并且会返回相同的结果,前提是没有丢失的id
s(即间隙)。
回答by sym
I found an interesting example to optimize SELECT queries ORDER BY id LIMIT X,Y. I have 35million of rows so it took like 2 minutes to find a range of rows.
我发现了一个有趣的例子来优化 SELECT 查询 ORDER BY id LIMIT X,Y。我有 3500 万行,所以需要 2 分钟才能找到一系列行。
Here is the trick :
这是诀窍:
select id, name, address, phone
FROM customers
WHERE id > 990
ORDER BY id LIMIT 1000;
Just put the WHERE with the last id you got increase a lot the performance. For me it was from 2minutes to 1 second :)
只需将 WHERE 与您获得的最后一个 id 放在一起,即可大大提高性能。对我来说是 2 分钟到 1 秒 :)
Other interesting tricks here : http://www.iheavy.com/2013/06/19/3-ways-to-optimize-for-paging-in-mysql/
其他有趣的技巧:http: //www.iheavy.com/2013/06/19/3-ways-to-optimize-for-paging-in-mysql/
It works too with strings
它也适用于字符串
回答by bobs
The time-consuming part of the two queries is retrieving the rows from the table. Logically speaking, in the LIMIT 0, 30
version, only 30 rows need to be retrieved. In the LIMIT 10000, 30
version, 10000 rows are evaluated and 30 rows are returned. There can be some optimization can be done my the data-reading process, but consider the following:
这两个查询的耗时部分是从表中检索行。从逻辑上讲,在LIMIT 0, 30
版本中,只需要检索 30 行。在该LIMIT 10000, 30
版本中,评估了 10000 行并返回了 30 行。我的数据读取过程可以做一些优化,但请考虑以下几点:
What if you had a WHERE clause in the queries? The engine must return all rows that qualify, and then sort the data, and finally get the 30 rows.
如果查询中有 WHERE 子句怎么办?引擎必须返回所有符合条件的行,然后对数据进行排序,最后得到 30 行。
Also consider the case where rows are not processed in the ORDER BY sequence. All qualifying rows must be sorted to determine which rows to return.
还要考虑在 ORDER BY 序列中未处理行的情况。必须对所有符合条件的行进行排序以确定要返回哪些行。
回答by ch271828n
For those who are interested in a comparison and figures :)
对于那些对比较和数字感兴趣的人:)
Experiment 1: The dataset contains about 100 million rows. Each row contains several BIGINT, TINYINT, as well as two TEXT fields (deliberately) containing about 1k chars.
实验 1:数据集包含约 1 亿行。每行包含几个 BIGINT、TINYINT 以及两个包含大约 1k 个字符的 TEXT 字段(有意地)。
- Blue :=
SELECT * FROM post ORDER BY id LIMIT {offset}, 5
- Orange := @Quassnoi's method.
SELECT t.* FROM (SELECT id FROM post ORDER BY id LIMIT {offset}, 5) AS q JOIN post t ON t.id = q.id
- Of course, the third method,
... WHERE id>xxx LIMIT 0,5
, does not appear here since it should be constant time.
- 蓝色 :=
SELECT * FROM post ORDER BY id LIMIT {offset}, 5
- 橙色 := @Quassnoi 的方法。
SELECT t.* FROM (SELECT id FROM post ORDER BY id LIMIT {offset}, 5) AS q JOIN post t ON t.id = q.id
- 当然,第三种方法,
... WHERE id>xxx LIMIT 0,5
这里没有出现,因为它应该是常数时间。
Experiment 2: Similar thing, except that one row only has 3 BIGINTs.
实验 2:类似的东西,除了一行只有 3 个 BIGINT。
- green := the blue before
- red := the orange before
- 绿色 := 之前的蓝色
- 红色 := 之前的橙色