MySQL MySQL性能优化:按日期时间字段排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/714950/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 13:03:03  来源:igfitidea点击:

MySQL performance optimization: order by datetime field

mysqlperformanceselect

提问by Dennis G.

I have a table with roughly 100.000 blog postings, linked to a table with 50 feeds via an 1:n relationship. When I query both tables with a select statement, ordered by a datetime field of the postings table, MySQL always uses filesort, resulting in very slow query times (>1 second). Here's the schema of the postingstable (simplified):

我有一个包含大约 100.000 个博客帖子的表格,通过 1:n 关系链接到一个包含 50 个提要的表格。当我使用 select 语句查询两个表时,按发布表的日期时间字段排序,MySQL 总是使用文件排序,导致查询时间非常慢(> 1 秒)。这是postings表的架构(简化):

+---------------------+--------------+------+-----+---------+----------------+
| Field               | Type         | Null | Key | Default | Extra          |
+---------------------+--------------+------+-----+---------+----------------+
| id                  | int(11)      | NO   | PRI | NULL    | auto_increment |
| feed_id             | int(11)      | NO   | MUL | NULL    |                |
| crawl_date          | datetime     | NO   |     | NULL    |                |
| is_active           | tinyint(1)   | NO   | MUL | 0       |                |
| link                | varchar(255) | NO   | MUL | NULL    |                |
| author              | varchar(255) | NO   |     | NULL    |                |
| title               | varchar(255) | NO   |     | NULL    |                |
| excerpt             | text         | NO   |     | NULL    |                |
| long_excerpt        | text         | NO   |     | NULL    |                |
| user_offtopic_count | int(11)      | NO   | MUL | 0       |                |
+---------------------+--------------+------+-----+---------+----------------+

And here's the feedtable:

这是feed表格:

+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| type        | int(11)      | NO   | MUL | 0       |                |
| title       | varchar(255) | NO   |     | NULL    |                |
| website     | varchar(255) | NO   |     | NULL    |                |
| url         | varchar(255) | NO   |     | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

And here's the query that takes >1 second to execute. Please note that the post_datefield has an index, but MySQL isn't using it to sort the postings table:

这是执行时间超过 1 秒的查询。请注意,该post_date字段有一个索引,但 MySQL 没有使用它来对帖子表进行排序:

SELECT 
    `postings`.`id`, 
    UNIX_TIMESTAMP(postings.post_date) as post_date, 
    `postings`.`link`, 
    `postings`.`title`, 
    `postings`.`author`, 
    `postings`.`excerpt`, 
    `postings`.`long_excerpt`, 
    `feeds`.`title` AS feed_title, 
    `feeds`.`website` AS feed_website
FROM 
    (`postings`)
JOIN 
    `feeds` 
ON 
    `feeds`.`id` = `postings`.`feed_id`
WHERE 
    `feeds`.`type` = 1 AND 
    `postings`.`user_offtopic_count` < 10 AND 
    `postings`.`is_active` = 1
ORDER BY 
    `postings`.`post_date` desc
LIMIT 
    15  

The result of the explain extendedcommand on this query shows that MySQL is using filesort:

explain extended此查询的命令结果显示 MySQL 正在使用文件排序:

+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
| id | select_type | table    | type   | possible_keys                         | key       | key_len | ref                      | rows  | Extra                       |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
|  1 | SIMPLE      | postings | ref    | feed_id,is_active,user_offtopic_count | is_active | 1       | const                    | 30996 | Using where; Using filesort |
|  1 | SIMPLE      | feeds    | eq_ref | PRIMARY,type                          | PRIMARY   | 4       | feedian.postings.feed_id |     1 | Using where                 |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+

When I remove the order bypart, MySQL stops using filesort. Please let me know if you have any ideas on how to optimize this query to get MySQL to sort and select the data by using indexes. I have already tried a few things such as creating a combined index on all where/order by fields, as suggested by a few blog postings, but this didn't work either.

当我删除该order by部分时,MySQL 停止使用文件排序。如果您对如何优化此查询以让 MySQL 使用索引对数据进行排序和选择有任何想法,请告诉我。我已经尝试了一些方法,例如根据一些博客帖子的建议,在所有 where/order by 字段上创建组合索引,但这也不起作用。

回答by Quassnoi

Create a composite index either on postings (is_active, post_date)(in that order).

创建一个复合索引postings (is_active, post_date)(按该顺序)。

It will be used both for filtering on is_activeand ordering by post_date.

它将用于过滤is_active和排序post_date

MySQLshould show REFaccess method over this index in EXPLAIN EXTENDED.

MySQL应该REFEXPLAIN EXTENDED.

Note that you have a RANGEfiltering condition over user_offtopic_count, that's why you cannot use an index over this field both in filtering and in sorting by other field.

请注意,您有一个RANGE过滤条件 over user_offtopic_count,这就是为什么您不能在过滤和按其他字段排序时对该字段使用索引。

Depending on how selective is your user_offtopic_count(i. e. how many rows satisfy user_offtopic_count < 10), it may be more useful to create an index on user_offtopic_countand let the post_dates be sorted.

根据您的选择性user_offtopic_count(即满足多少行user_offtopic_count < 10),创建索引user_offtopic_count并让 post_dates 排序可能更有用。

To do this, create a composite index on postings (is_active, user_offtopic_count)and make sure the RANGEaccess method over this index is used.

为此,请创建一个复合索引postings (is_active, user_offtopic_count)并确保RANGE使用对该索引的访问方法。

Which index will be faster depends on your data distribuion. Create both indexes, FORCEthem and see which is faster:

哪个索引更快取决于您的数据分布。创建两个索引,FORCE它们并查看哪个更快:

CREATE INDEX ix_active_offtopic ON postings (is_active, user_offtopic_count);
CREATE INDEX ix_active_date ON postings (is_active, post_date);

SELECT 
    `postings`.`id`, 
    UNIX_TIMESTAMP(postings.post_date) as post_date, 
    `postings`.`link`, 
    `postings`.`title`, 
    `postings`.`author`, 
    `postings`.`excerpt`, 
    `postings`.`long_excerpt`, 
    `feeds`.`title` AS feed_title, 
    `feeds`.`website` AS feed_website
FROM 
    `postings` FORCE INDEX (ix_active_offtopic)
JOIN 
    `feeds` 
ON 
    `feeds`.`id` = `postings`.`feed_id`
WHERE 
    `feeds`.`type` = 1 AND 
    `postings`.`user_offtopic_count` < 10 AND 
    `postings`.`is_active` = 1
ORDER BY 
    `postings`.`post_date` desc
LIMIT 
    15

/* This should show RANGE access with few rows and keep the FILESORT */

SELECT 
    `postings`.`id`, 
    UNIX_TIMESTAMP(postings.post_date) as post_date, 
    `postings`.`link`, 
    `postings`.`title`, 
    `postings`.`author`, 
    `postings`.`excerpt`, 
    `postings`.`long_excerpt`, 
    `feeds`.`title` AS feed_title, 
    `feeds`.`website` AS feed_website
FROM 
    `postings` FORCE INDEX (ix_active_date)
JOIN 
    `feeds` 
ON 
    `feeds`.`id` = `postings`.`feed_id`
WHERE 
    `feeds`.`type` = 1 AND 
    `postings`.`user_offtopic_count` < 10 AND 
    `postings`.`is_active` = 1
ORDER BY 
    `postings`.`post_date` desc
LIMIT 
    15

/* This should show REF access with lots of rows and no FILESORT */

回答by ???u

MySQL has two filesort algorithms: an older filesort that sorts records on disk, and a new version that works in memory.

MySQL 有两种文件排序算法:一种是对磁盘上的记录进行排序的旧文件排序,另一种是在内存中工作的新版本。

If it cannot use an index on the first table in the join to sort the query, it will have to do a filesort. If resultset before sorting converted to fixed-width format is greater than the sort buffer ORif it contains any text fields, it will have to use the slower on-disk filesort algorithm (the second condition is satisfied since your query has a text field).

如果它不能使用连接中第一个表上的索引来对查询进行排序,则它必须执行文件排序。如果排序转换为固定宽度格式之前的结果集大于排序缓冲区如果它包含任何文本字段,则必须使用较慢的磁盘文件排序算法(满足第二个条件,因为您的查询具有文本字段) .

MySQL is choosing to use the is_active column, ostensibly because it thinks that column is most selective in eliminating rows before it continues with the other joins and where conditions. The first thing I would suggest would be to try creating composite indexes with post_date, feed_id, and the columns in the where condition, e.g. (is_active, user_offtopic_count, post_date, feed_id).

MySQL 选择使用 is_active 列,表面上是因为它认为该列在继续其他连接和 where 条件之前在消除行方面最具选择性。我建议的第一件事是尝试使用 post_date、feed_id 和 where 条件中的列创建复合索引,例如 (is_active, user_offtopic_count, post_date, feed_id)。

回答by Chris Henry

Also, it's important to remember that MySQL won't use an index if the column you're ordering by has a function applied to it.

此外,重要的是要记住,如果您排序的列应用了一个函数,则 MySQL 不会使用索引。

You should also try aliasing postings.post_date as something else. This will tell MySQL to order by the unaltered column, and you'll still select the unix timestamp.

您还应该尝试将 posts.post_date 别名为其他内容。这将告诉 MySQL 按未更改的列排序,您仍将选择 unix 时间戳。