MySQL 如何使JOIN查询使用索引？

Question

提问by Silver Light

I have two tables:

我有两个表：

CREATE TABLE `articles` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `title` varchar(1000) DEFAULT NULL,
  `last_updated` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `last_updated` (`last_updated`),
) ENGINE=InnoDB AUTO_INCREMENT=799681 DEFAULT CHARSET=utf8 

CREATE TABLE `article_categories` (
  `article_id` int(11) NOT NULL DEFAULT '0',
  `category_id` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`article_id`,`category_id`),
  KEY `category_id` (`category_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

This is my query:

这是我的查询：

SELECT a.*
FROM
    articles AS a,
    article_categories AS c
WHERE
    a.id = c.article_id
    AND c.category_id = 78
    AND a.comment_cnt > 0
    AND a.deleted = 0
ORDER BY a.last_updated
LIMIT 100, 20

And an EXPLAINfor it:

和EXPLAIN它：

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: index
possible_keys: PRIMARY
          key: last_updated
      key_len: 9
          ref: NULL
         rows: 2040
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: eq_ref
possible_keys: PRIMARY,fandom_id
          key: PRIMARY
      key_len: 8
          ref: db.a.id,const
         rows: 1
        Extra: Using index

It uses a full index scan of last_updatedon the first table for sorting, but does not use an y index for join (type: indexin explain). This is very bad for performance and kills the whole database server, since this is a very frequent query.

它使用last_updated第一个表上的完整索引扫描进行排序，但不使用 y 索引进行连接（type: index在解释中）。这对性能非常不利并且会杀死整个数据库服务器，因为这是一个非常频繁的查询。

I've tried reversing table order with STRAIGHT_JOIN, but this gives filesort, using_temporary, which is even worse.

我试过用反转表顺序STRAIGHT_JOIN，但这给出了filesort, using_temporary，甚至更糟。

Is there any way to make mysql use index for join and for sorting at the same time?

有什么办法可以让mysql同时使用索引进行连接和排序？

=== update ===

=== 更新 ===

I'm really desparate in this. Maybe some kind of denormalization can help here?

我真的很绝望。也许某种非规范化可以在这里提供帮助？

Answer 1

回答by Quassnoi

If you have lots of categories, this query cannot be made efficient. No single index can cover two tables at once in MySQL.

如果您有很多类别，则无法使此查询高效。在MySQL.

You have to do denormalization: add last_updated, has_commentsand deletedinto article_categories:

你所要做的非规范化：添加last_updated，has_comments并deleted为article_categories：

CREATE TABLE `article_categories` (
  `article_id` int(11) NOT NULL DEFAULT '0',
  `category_id` int(11) NOT NULL DEFAULT '0',
  `last_updated` timestamp NOT NULL,
  `has_comments` boolean NOT NULL,
  `deleted` boolean NOT NULL,
  PRIMARY KEY (`article_id`,`category_id`),
  KEY `category_id` (`category_id`),
  KEY `ix_articlecategories_category_comments_deleted_updated` (category_id, has_comments, deleted, last_updated)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

and run this query:

并运行此查询：

SELECT  *
FROM    (
        SELECT  article_id
        FROM    article_categories
        WHERE   (category_id, has_comments, deleted) = (78, 1, 0)
        ORDER BY
                last_updated DESC
        LIMIT   100, 20
        ) q
JOIN    articles a
ON      a.id = q.article_id

Of course you should update article_categoriesas well whenever you update relevant columns in article. This can be done in a trigger.

当然article_categories，每当您更新article. 这可以在触发器中完成。

Note that the column has_commentsis boolean: this will allow using an equality predicate to make a single range scan over the index.

请注意，该列has_comments是布尔值：这将允许使用相等谓词对索引进行单个范围扫描。

Also note that the LIMITgoes into the subquery. This makes MySQLuse late row lookups which it does not use by default. See this article in my blog about why do they increase performance:

另请注意，LIMIT进入子查询。这使得MySQL使用默认情况下不使用的后期行查找。请参阅我的博客中有关为什么它们会提高性能的文章：

MySQL ORDER BY / LIMIT performance: late row lookups

MySQL ORDER BY / LIMIT 性能：后期行查找

If you were on SQL Server, you could make an indexable view over your query, which essentially would make a denormalized indexed copy of article_categorieswith the additional fields, automatically mainained by the server.

如果你在 SQL Server 上，你可以对你的查询创建一个可索引的视图，这基本上会创建一个非规范化的索引副本，article_categories带有附加字段，由服务器自动维护。

Unfortunately, MySQLdoes not support this and you will have to create such a table manually and write additional code to keep it in sync with the base tables.

不幸的是，MySQL不支持这一点，您必须手动创建这样的表并编写额外的代码以使其与基表保持同步。

Answer 2

回答by Denis de Bernardy

Before getting to your specific query, it's important to understand how an index works.

在进入您的特定查询之前，了解索引的工作原理很重要。

With appropriate statistics, this query:

使用适当的统计信息，此查询：

select * from foo where bar = 'bar'

... will use an index on foo(bar)if it's selective. That means, if bar = 'bar'amounts to selecting most of the table's rows, it'll go faster to just read the table and eliminate rows that don't apply. In contrast, if bar = 'bar'means only selecting a handful of rows, reading the index makes sense.

...foo(bar)如果它是选择性的，将使用索引。这意味着，如果bar = 'bar'相当于选择表的大部分行，那么只读取表并消除不适用的行会更快。相比之下，如果bar = 'bar'意味着只选择少数行，读取索引是有意义的。

Suppose we now toss in an order clause and that you've indexes on each of foo(bar)and foo(baz):

假设我们现在加入一个 order 子句，并且您已经在每个foo(bar)和上建立了索引foo(baz)：

select * from foo where bar = 'bar' order by baz

If bar = 'bar'is very selective, it's cheap to grab all rows that comply, and to sort them in memory. If it's not at all selective, the index on foo(baz)makes little sense because you'll fetch the entire table anyway: using it would mean going back and forth on disk pages to read the rows in order, which is very expensive.

如果bar = 'bar'是非常有选择性的，那么获取所有符合的行并在内存中对它们进行排序是很便宜的。如果它根本不是选择性的，那么索引就foo(baz)没有意义，因为无论如何你都会获取整个表：使用它意味着在磁盘页面上来回按顺序读取行，这是非常昂贵的。

Toss in a limit clause, however, and foo(baz)might suddenly make sense:

然而，加入限制条款，foo(baz)可能会突然变得有意义：

select * from foo where bar = 'bar' order by baz limit 10

If bar = 'bar'is very selective, it's still a good option. If it's not at all selective, you'll quickly find 10 matching rows by scanning the index on foo(baz)-- you might read 10 rows, or 50, but you'll find 10 good ones soon enough.

如果bar = 'bar'非常有选择性，它仍然是一个不错的选择。如果它根本没有选择性，您将通过扫描索引快速找到 10 行匹配的行foo(baz)——您可能会阅读 10 行或 50 行，但很快就会找到 10 行。

Suppose the latter query with indexes on foo(bar, baz)and foo(baz, bar)instead. Indexes are read from left to right. One makes very good sense for this potential query, the other might make none at all. Think of them like this:

假设后一个查询使用索引 on foo(bar, baz)andfoo(baz, bar)代替。索引从左到右读取。一个对这个潜在的查询非常有意义，另一个可能根本没有。像这样想它们：

bar   baz    baz   bar
---------    ---------
bad   aaa    aaa   bad
bad   bbb    aaa   bar
bar   aaa    bbb   bad
bar   bbb    bbb   bar

As you can see, the index on foo(bar, baz)allows to start reading at ('bar', 'aaa')and fetching the rows in order from that point forward.

如您所见，索引 onfoo(bar, baz)允许('bar', 'aaa')从该点开始按顺序读取和获取行。

The index on foo(baz, bar), on the contrary, yields rows sorted by bazirrespective of what barmight hold. If bar = 'bar'is not at all selective as a criteria, you'll quickly run into matching rows for your query, in which case it makes sense to use it. If it's very selective, you may end up iterating gazillions of rows before finding enough that match bar = 'bar'-- it might still be a good option, but it's as optimal.

foo(baz, bar)相反，上的索引产生按排序的行，baz而不管bar可能持有什么。如果bar = 'bar'根本没有选择性作为标准，您将很快遇到查询的匹配行，在这种情况下使用它是有意义的。如果它非常有选择性，您可能会在找到足够多的匹配之前迭代无数行bar = 'bar'——它可能仍然是一个不错的选择，但它是最佳选择。

With that being addressed, let's get back to your original query...

解决这个问题后，让我们回到您最初的查询......

You need to join articles with categories, to filter articles that are in a particular category, with more than one comment, that aren't deleted, and then sort them by date, and then grabbing a handful of them.

您需要将文章与类别连接起来，以过滤特定类别中的文章，其中有多个评论，未删除，然后按日期对它们进行排序，然后抓取其中的一小部分。

I take it that most articles are not deleted, so an index on that criteria won't be of much use -- it'll only slow down writes and query planning.

我认为大多数文章都不会被删除，因此该标准的索引不会有太大用处——它只会减慢写入和查询计划的速度。

I presume most articles have a comment or more, so that won't be selective either. I.e. there's little need to index it either.

我认为大多数文章都有评论或更多评论，因此也不会有选择性。即几乎不需要对其进行索引。

Without your category filter, index options are reasonably obvious: articles(last_updated); possibly with the comment count column to the right, and the deleted flag to the left.

没有你的类别过滤器，指数期权是相当明显的：articles(last_updated); 可能右侧是评论计数列，左侧是已删除标志。

With your category filter, it all depends...

使用您的类别过滤器，这一切都取决于...

If your category filter is very selective, it actually makes very good sense to select all rows that are within that category, sort them in memory, and pick the top matching rows.

如果您的类别过滤器非常有选择性，那么选择该类别内的所有行、在内存中对它们进行排序并选择最匹配的行实际上是非常有意义的。

If your category filter is not at all selective and yields almost all articles, the index on articles(last_update)makes sense: valid rows are all over the place, so read rows in order until you find enough that match and voilà.

如果您的类别过滤器根本没有选择性并且产生几乎所有文章，那么索引就articles(last_update)有意义：有效的行到处都是，所以按顺序阅读行，直到找到足够的匹配和瞧。

In the more general case, it's just vaguely selective. To the best of my knowledge, the stats collected don't look into correlations much. Thus, the planner has no good way to estimate whether it'll find articles with the right category fast enough to be worth reading the latter index. Joining and sorting in memory will usually be cheaper, so the planner goes with that.

在更一般的情况下，它只是模糊的选择性。据我所知，收集的统计数据并没有过多地研究相关性。因此，规划器没有很好的方法来估计它是否会以足够快的速度找到具有正确类别的文章，值得阅读后一个索引。在内存中加入和排序通常会更便宜，所以计划者会这样做。

Anyway, you've two options to force the use of an index.

无论如何，您有两个选项可以强制使用索引。

One is to acknowledge that the query planner is not perfect and to use a hint:

一种是承认查询计划器并不完美并使用提示：

http://dev.mysql.com/doc/refman/5.5/en/index-hints.html

Be wary though, because sometimes the planner is actually correct in not wanting to use the index you'd like it to or vice version. Also, it may become correct in a future version of MySQL, so keep that in mind as you maintain your code over the years.

不过要小心，因为有时计划者实际上是正确的，不想使用您想要的索引或副版本。此外，它可能会在 MySQL 的未来版本中变得正确，因此在多年来维护代码时请记住这一点。

Edit: STRAIGHT_JOIN, as point out by DRap works too, with similar caveats.

编辑：STRAIGHT_JOIN正如 DRap 所指出的那样，也有类似的警告。

The other is to maintain an extra column to tag frequently selected articles (e.g. a tinyint field, which is set to 1 when they belong to your specific category), and then add an index on e.g. articles(cat_78, last_updated). Maintain it using a trigger and you'll do fine.

另一种是维护一个额外的列来标记经常选择的文章（例如 tinyint 字段，当它们属于您的特定类别时设置为 1），然后在 eg 上添加索引articles(cat_78, last_updated)。使用触发器维护它，你会做得很好。

Answer 3

回答by newtover

First of all, I would recommend reading the article 3 ways MySQL uses indexes.

首先，我建议阅读文章MySQL 使用索引的 3 种方式。

And now, when you know the basics, you can optimize this particular query.

现在，当您了解基础知识后，就可以优化此特定查询。

MySQL can not use index for ordering, it just can output data in an order of an index. Since MySQL uses nested loops for joining, the field you want to order by should be in the first table in the join (you see the order of join in EXPLAIN results, and can affect it by creating specific indexes and (if it does not help) by forcing required indexes).

MySQL 不能使用索引进行排序，它只能按照索引的顺序输出数据。由于 MySQL 使用嵌套循环进行连接，您要排序的字段应该在连接的第一个表中（您可以在 EXPLAIN 结果中看到连接的顺序，并且可以通过创建特定索引和（如果它没有帮助）影响它) 通过强制所需的索引)。

Another important thing is that before ordering you fetch all columns for all filtered rows from atable and then skip probably most of them. It is much more effifient to get a list of required row ids and fetch only those rows.

另一个重要的事情是，在订购之前，您从a表中获取所有过滤行的所有列，然后可能会跳过其中的大部分。获取所需行 id 的列表并仅获取那些行要高效得多。

To make this work you will need a covering index (deleted, comment_cnt, last_updated)on table a, and now you can rewrite the query as follows:

为了完成这项工作，您需要(deleted, comment_cnt, last_updated)在 table 上建立一个覆盖索引a，现在您可以按如下方式重写查询：

SELECT *
FROM (
  SELECT a.id
  FROM articles AS a,
  JOIN article_categories AS c
    ON a.id = c.article_id AND c.category_id = 78
  WHERE a.comment_cnt > 0 AND a.deleted = 0
  ORDER BY a.last_updated
  LIMIT 100, 20
) as ids
JOIN articles USING (id);

P.S. Your table definition for table adoes not contain comment_cntcolumn ;)

PS 你的表定义a不包含comment_cnt列;)

Answer 4

回答by Andomar

Use of a non-covering index is expensive. For each row, any uncovered columns have to be retrieved from the base table, using the primary key. So I'd first try to make the index on articlescovering. That might help convince the MySQL query optimizer that the index is useful. For example:

使用非覆盖索引是昂贵的。对于每一行，必须使用主键从基表中检索任何未覆盖的列。所以我首先尝试制作articles覆盖索引。这可能有助于让 MySQL 查询优化器相信索引是有用的。例如：

KEY IX_Articles_last_updated (last_updated, id, title, comment_cnt, deleted),

If that doesn't help, you could play around with FORCE INDEX:

如果这没有帮助，你可以玩FORCE INDEX：

SELECT  a.*
FROM    article_categories AS c FORCE INDEX (IX_Articles_last_updated)
JOIN    articles AS a FORCE INDEX (PRIMARY)
ON      a.id = c.article_id
WHERE   c.category_id = 78
        AND a.comment_cnt > 0
        AND a.deleted = 0
ORDER BY 
        a.last_updated
LIMIT   100, 20

The name of the index enforcing the primary key is always "primary".

强制执行主键的索引的名称始终是“primary”。

Answer 5

回答by Muhammad Gelbana

You can use influence MySQL to use KEYSor INDEXES

您可以使用影响 MySQL 来使用KEYS或INDEXES

For

为了

Ordering, or
Grouping, or
Join

订购，或
分组，或
加入

For extra information, follow this link. I intended to use this for joining (i.e. USE INDEX FOR JOIN (My_Index)but it didn't work as expected. Removing the FOR JOINpart sped up my query significantly, from more than 3.5 hours, to 1-2 seconds. Simply because MySQL was forced to use the right index.

如需更多信息，请点击此链接。我打算用它来加入（即USE INDEX FOR JOIN (My_Index)但它没有按预期工作。删除该FOR JOIN部分显着加快了我的查询速度，从超过 3.5 小时到 1-2 秒。仅仅是因为 MySQL 被迫使用正确的索引。

Answer 6

回答by DRapp

I would have the following indexes available

我将有以下索引可用

articles table -- INDEX ( deleted, last_updated, comment_cnt )

文章表 -- INDEX（已删除、last_updated、comment_cnt）

article_categories table -- INDEX ( article_id, category_id ) -- you already have this index

article_categories 表 -- INDEX ( article_id, category_id ) -- 你已经有了这个索引

then add Straight_Join to force doing the query as listed instead of it trying to use the article_categories table via whatever statistics it may have to help the query.

然后添加 Straight_Join 以强制执行列出的查询，而不是尝试通过任何可能有助于查询的统计信息使用 article_categories 表。

SELECT STRAIGHT_JOIN
      a.*
   FROM
      articles AS a
         JOIN article_categories AS c
            ON a.id = c.article_id
            AND c.category_id = 78
   WHERE
          a.deleted = 0
      AND a.comment_cnt > 0
   ORDER BY 
      a.last_updated
   LIMIT 
      100, 20

As per comment / feedback, I would consider reversing based on set if category records is much smaller basis... such as

根据评论/反馈，如果类别记录小得多，我会考虑根据集合进行反转......例如

SELECT STRAIGHT_JOIN
      a.*
   FROM
      article_categories AS c
         JOIN articles as a
            ON c.article_id = a.id
           AND a.deleted = 0
           AND a.Comment_cnt > 0
   WHERE
      c.category_id = 78
   ORDER BY 
      a.last_updated
   LIMIT 
      100, 20

In this case, I would ensure an index on the articles table by

在这种情况下，我将通过以下方式确保文章表上的索引

index -- (id, deleted, last_updated)

索引 -- (id, 删除, last_updated)

MySQL 如何使JOIN查询使用索引？

提问by Silver Light

回答by Quassnoi

回答by Denis de Bernardy

回答by newtover

回答by Andomar

回答by Muhammad Gelbana

回答by DRapp

相关推荐

最近更新

标签

MySQL 如何使JOIN查询使用索引？

提问by Silver Light

回答by Quassnoi

回答by Denis de Bernardy

回答by newtover

回答by Andomar

回答by Muhammad Gelbana

回答by DRapp

相关推荐

MYSQL - 连接两个表

MySQL MySQLSyntaxErrorException：表 XYZ 不存在

MySQL 如何比较两个相同表中的行并返回缺失行的结果集？

不要在 MySQL 的表上重复 (DISTINCT) 一列

相关推荐

最近更新

标签