为什么 MYSQL 较高的 LIMIT 偏移量会减慢查询速度?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4481388/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 18:02:54  来源:igfitidea点击:

Why does MYSQL higher LIMIT offset slow the query down?

mysqlperformancesql-order-bylimit

提问by Rahman

Scenario in short: A table with more than 16 million records [2GB in size]. The higher LIMIT offset with SELECT, the slower the query becomes, when using ORDER BY *primary_key*

简而言之场景:一个包含超过 1600 万条记录 [2GB 大小] 的表。当使用 ORDER BY *primary_key* 时,SELECT 的 LIMIT 偏移量越大,查询变得越慢

So

所以

SELECT * FROM large ORDER BY `id`  LIMIT 0, 30 

takes far less than

远远少于

SELECT * FROM large ORDER BY `id` LIMIT 10000, 30 

That only orders 30 records and same eitherway. So it's not the overhead from ORDER BY.
Now when fetching the latest 30 rows it takes around 180 seconds. How can I optimize that simple query?

那只订购了 30 条记录,无论如何都是一样的。所以这不是来自 ORDER BY 的开销。
现在获取最新的 30 行大约需要 180 秒。如何优化这个简单的查询?

回答by Nikos Kyr

I had the exact same problem myself. Given the fact that you want to collect a large amount of this data and not a specific set of 30 you'll be probably running a loop and incrementing the offset by 30.

我自己也遇到了完全相同的问题。鉴于您想要收集大量此类数据而不是一组特定的 30,您可能会运行一个循环并将偏移量增加 30。

So what you can do instead is:

所以你可以做的是:

  1. Hold the last id of a set of data(30) (e.g. lastId = 530)
  2. Add the condition WHERE id > lastId limit 0,30
  1. 保存一组数据的最后一个 id(30)(例如 lastId = 530)
  2. 添加条件 WHERE id > lastId limit 0,30

So you can always have a ZERO offset. You will be amazed by the performance improvement.

所以你总是可以有一个零偏移。你会惊讶于性能的提升。

回答by Quassnoi

It's normal that higher offsets slow the query down, since the query needs to count off the first OFFSET + LIMITrecords (and take only LIMITof them). The higher is this value, the longer the query runs.

较高的偏移量会减慢查询速度是正常的,因为查询需要对第一OFFSET + LIMIT条记录进行计数(并且只LIMIT取它们)。该值越高,查询运行的时间越长。

The query cannot go right to OFFSETbecause, first, the records can be of different length, and, second, there can be gaps from deleted records. It needs to check and count each record on its way.

查询不能正确执行,OFFSET因为首先,记录的长度可能不同,其次,删除的记录可能存在间隙。它需要检查和计算途中的每条记录。

Assuming that idis a PRIMARY KEYof a MyISAMtable, you can speed it up by using this trick:

假设idPRIMARY KEY一个的MyISAM表,你可以使用这一招加快步伐:

SELECT  t.*
FROM    (
        SELECT  id
        FROM    mytable
        ORDER BY
                id
        LIMIT 10000, 30
        ) q
JOIN    mytable t
ON      t.id = q.id

See this article:

看这篇文章:

回答by Riedsio

MySQL cannot go directly to the 10000th record (or the 80000th byte as your suggesting) because it cannot assume that it's packed/ordered like that (or that it has continuous values in 1 to 10000). Although it might be that way in actuality, MySQL cannot assume that there are no holes/gaps/deleted ids.

MySQL 不能直接转到第 10000 条记录(或您建议的第 80000 字节),因为它不能假设它是这样打包/排序的(或者它在 1 到 10000 之间具有连续值)。尽管实际上可能是这样,但 MySQL 不能假设没有漏洞/间隙/删除的 id。

So, as bobs noted, MySQL will have to fetch 10000 rows (or traverse through 10000th entries of the index on id) before finding the 30 to return.

因此,正如 bob 所指出的,id在找到要返回的 30 行之前,MySQL 必须获取 10000 行(或遍历索引的第 10000 个条目)。

EDIT: To illustrate my point

编辑:为了说明我的观点

Note that although

请注意,虽然

SELECT * FROM large ORDER BY id LIMIT 10000, 30 

would be slow(er),

会很慢(呃)

SELECT * FROM large WHERE id >  10000 ORDER BY id LIMIT 30 

would be fast(er), and would return the same results provided that there are no missing ids (i.e. gaps).

将是fast(er),并且会返回相同的结果,前提是没有丢失的ids(即间隙)。

回答by sym

I found an interesting example to optimize SELECT queries ORDER BY id LIMIT X,Y. I have 35million of rows so it took like 2 minutes to find a range of rows.

我发现了一个有趣的例子来优化 SELECT 查询 ORDER BY id LIMIT X,Y。我有 3500 万行,所以需要 2 分钟才能找到一系列行。

Here is the trick :

这是诀窍:

select id, name, address, phone
FROM customers
WHERE id > 990
ORDER BY id LIMIT 1000;

Just put the WHERE with the last id you got increase a lot the performance. For me it was from 2minutes to 1 second :)

只需将 WHERE 与您获得的最后一个 id 放在一起,即可大大提高性能。对我来说是 2 分钟到 1 秒 :)

Other interesting tricks here : http://www.iheavy.com/2013/06/19/3-ways-to-optimize-for-paging-in-mysql/

其他有趣的技巧:http: //www.iheavy.com/2013/06/19/3-ways-to-optimize-for-paging-in-mysql/

It works too with strings

它也适用于字符串

回答by bobs

The time-consuming part of the two queries is retrieving the rows from the table. Logically speaking, in the LIMIT 0, 30version, only 30 rows need to be retrieved. In the LIMIT 10000, 30version, 10000 rows are evaluated and 30 rows are returned. There can be some optimization can be done my the data-reading process, but consider the following:

这两个查询的耗时部分是从表中检索行。从逻辑上讲,在LIMIT 0, 30版本中,只需要检索 30 行。在该LIMIT 10000, 30版本中,评估了 10000 行并返回了 30 行。我的数据读取过程可以做一些优化,但请考虑以下几点:

What if you had a WHERE clause in the queries? The engine must return all rows that qualify, and then sort the data, and finally get the 30 rows.

如果查询中有 WHERE 子句怎么办?引擎必须返回所有符合条件的行,然后对数据进行排序,最后得到 30 行。

Also consider the case where rows are not processed in the ORDER BY sequence. All qualifying rows must be sorted to determine which rows to return.

还要考虑在 ORDER BY 序列中未处理行的情况。必须对所有符合条件的行进行排序以确定要返回哪些行。

回答by ch271828n

For those who are interested in a comparison and figures :)

对于那些对比较和数字感兴趣的人:)

Experiment 1: The dataset contains about 100 million rows. Each row contains several BIGINT, TINYINT, as well as two TEXT fields (deliberately) containing about 1k chars.

实验 1:数据集包含约 1 亿行。每行包含几个 BIGINT、TINYINT 以及两个包含大约 1k 个字符的 TEXT 字段(有意地)。

  • Blue := SELECT * FROM post ORDER BY id LIMIT {offset}, 5
  • Orange := @Quassnoi's method. SELECT t.* FROM (SELECT id FROM post ORDER BY id LIMIT {offset}, 5) AS q JOIN post t ON t.id = q.id
  • Of course, the third method, ... WHERE id>xxx LIMIT 0,5, does not appear here since it should be constant time.
  • 蓝色 := SELECT * FROM post ORDER BY id LIMIT {offset}, 5
  • 橙色 := @Quassnoi 的方法。 SELECT t.* FROM (SELECT id FROM post ORDER BY id LIMIT {offset}, 5) AS q JOIN post t ON t.id = q.id
  • 当然,第三种方法,... WHERE id>xxx LIMIT 0,5这里没有出现,因为它应该是常数时间。

Experiment 2: Similar thing, except that one row only has 3 BIGINTs.

实验 2:类似的东西,除了一行只有 3 个 BIGINT。

  • green := the blue before
  • red := the orange before
  • 绿色 := 之前的蓝色
  • 红色 := 之前的橙色

enter image description here

在此处输入图片说明