如何让 MySQL 使用 INDEX 进行视图查询？

Question

提问by Roger Ray

I'm working on a web project with MySql database on Java EE. We needed a view to summarize data from 3 tables with over 3M rows overall. Each table was created with index. But I haven't found out a way to take advantages in the indexes in the conditional select statement retrieval from the view that we created with [group by].

我正在使用 Java EE 上的 MySql 数据库开发 Web 项目。我们需要一个视图来汇总 3 个总行超过 300 万行的表中的数据。每个表都是用索引创建的。但是我还没有找到一种方法可以从我们使用 [group by] 创建的视图中利用条件选择语句检索中的索引。

I've getting suggestions from people that using views in MySql is not a good idea. Because you can't create index for views in mysql like in oracle. But in some test that I took, indexes can be used in view select statement. Maybe I've created those views in a wrong way.

我从人们那里得到建议，在 MySql中使用视图不是一个好主意。因为您不能像在 oracle 中那样为 mysql 中的视图创建索引。但是在我进行的一些测试中，可以在视图选择语句中使用索引。也许我以错误的方式创建了这些视图。

I'll use a example to describe my problem.

我将用一个例子来描述我的问题。

We have a table that records data for high scores in NBA games, with index on column [happend_in]

我们有一个记录 NBA 比赛高分数据的表格，在列 [happend_in] 上有索引

CREATE  TABLE `highscores` (
   `tbl_id` int(11) NOT NULL auto_increment,
   `happened_in` int(4) default NULL,
   `player` int(3) default NULL,
   `score` int(3) default NULL,
   PRIMARY KEY  (`tbl_id`),
   KEY `index_happened_in` (`happened_in`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert data(8 rows)

插入数据（8 行）

INSERT INTO highscores(happened_in, player, score)
VALUES (2006, 24, 61),(2006, 24, 44),(2006, 24, 81),
(1998, 23, 51),(1997, 23, 46),(2006, 3, 55),(2007, 24, 34), (2008, 24, 37);

then I create a view to see the highest score that Kobe Bryant got in each year

然后我创建一个视图来查看科比·布莱恩特每年获得的最高分

CREATE OR REPLACE VIEW v_kobe_highScores
AS
   SELECT player, max(score) AS highest_score, happened_in
   FROM highscores
   WHERE player = 24
   GROUP BY happened_in;

I wrote a conditional statement to see the highest score that kobegot in 2006;

我写了个条件语句，看看科比在2006年拿到的最高分；

select * from v_kobe_highscores where happened_in = 2006;

When I explain it in toad for mysql, I found out that mysql have scan all rowsto form the view, then find data with condition in it, without using index on [happened_in].

当我在toad for mysql中解释的时候，我发现mysql已经扫描了所有行来形成视图，然后在其中查找有条件的数据，而不使用[happened_in]上的索引。

explain select * from v_kobe_highscores where happened_in = 2006;

explain result

解释结果

The view that we use in our project is built among tables with millions of rows. Scanning all the rows from table in every view data retrieval is unacceptable. Please help! Thanks!

我们在项目中使用的视图是在具有数百万行的表中构建的。在每个视图数据检索中扫描表中的所有行是不可接受的。请帮忙！谢谢！

@zerkms Here is the result I tested on real-life. I don't see much differences between. I think @spencer7593 has the right point. The MySQL optimizer doesn't "push" that predicate down in the view query. real-life test

@zerkms 这是我在现实生活中测试的结果。我看不出有多大区别。我认为@spencer7593 的观点是正确的。MySQL 优化器不会在视图查询中“推送”该谓词。 实际测试

Answer 1

采纳答案by spencer7593

How do you get MySQL to use an index for a view query? The short answer, provide an index that MySQL can use.

如何让 MySQL 使用索引进行视图查询？简短的回答，提供一个 MySQL 可以使用的索引。

In this case, the optimum index is likely a "covering" index:

在这种情况下，最佳索引可能是“覆盖”索引：

... ON highscores (player, happened_in, score)

It's likely that MySQL will use that index, and the EXPLAIN will show: "Using index"due to the WHERE player = 24(an equality predicate on the leading column in the index. The GROUP BY happened_id(the second column in the index), may allow MySQL to optimize that using the index to avoid a sort operation. Including the scorecolumn in the index will allow the query to satisfied entirely from the index, without having to visit (lookup) the data pages referenced by the index.

MySQL 很可能会使用该索引，并且 EXPLAIN 将显示："Using index"由于WHERE player = 24(索引中前导列上的相等谓词。索引中GROUP BY happened_id的第二列)，可能允许 MySQL 使用索引来优化它避免排序操作。将score列包含在索引中将使查询完全从索引中得到满足，而不必访问（查找）索引引用的数据页。

That's the quick answer. The longer answer is that MySQL is very unlikely to use an index with leading column of happened_idfor the view query.

这就是快速答案。更长的答案是 MySQL 不太可能使用带有前导列的索引happened_id来进行视图查询。

Why the view causes a performance issue

为什么视图会导致性能问题

One of the issues you have with the MySQL view is that MySQL does not "push" the predicate from the outer query down into the view query.

MySQL 视图的问题之一是 MySQL 不会将谓词从外部查询“推送”到视图查询中。

Your outer query specifies WHERE happened_in = 2006. The MySQL optimizer does not consider the predicate when it runs the inner "view query". That query for the view gets executed separately, before the outer query. The resultset from the execution of that query get "materialized"; that is, the results are stored as an intermediate MyISAM table. (MySQL calls it a "derived table", and that name they use makes sense, when you understand the operations that MysQL performs.)

您的外部查询指定WHERE happened_in = 2006. MySQL 优化器在运行内部“视图查询”时不考虑谓词。该视图查询在外部查询之前单独执行。执行该查询的结果集被“物化”；也就是说，结果存储为一个中间 MyISAM 表。（MySQL 称其为“派生表”，当您了解 MySQL 执行的操作时，他们使用的名称是有道理的。）

The bottom line is that the index you have defined on happened_inis not being used by MySQL when it rusn the query that forms the view definition.

最重要的是，happened_in当 MySQL 运行形成视图定义的查询时，您定义的索引没有被 MySQL 使用。

After the intermediate "derived table" is created, THEN the outer query is executed, using that "derived table" as a rowsource. It's when that outer query runs that the happened_in = 2006predicate is evaluated.

创建中间“派生表”后，然后执行外部查询，使用该“派生表”作为行源。当外部查询运行时，happened_in = 2006谓词被评估。

Note that all of the rows from the view query are stored, which (in your case) is a row for EVERY value of happened_in, not just the one you specify an equality predicate on in the outer query.

请注意，视图查询中的所有行都被存储，（在您的情况下）是每个值的行happened_in，而不仅仅是您在外部查询中指定相等谓词的行。

The way that view queries are processed may be "unexpected" by some, and this is one reason that using "views" in MySQL can lead to performance problems, as compared to the way view queries are processed by other relational databases.

处理视图查询的方式可能有些人“出乎意料”，这就是与其他关系数据库处理视图查询的方式相比，在 MySQL 中使用“视图”会导致性能问题的原因之一。

Improving performance of the view query with a suitable covering index

使用合适的覆盖索引提高视图查询的性能

Given your view definition and your query, about the best you are going to get would be a "Using index" access method for the view query. To get that, you'd need a covering index, e.g.

给定您的视图定义和查询，您将获得的最好的方法是视图查询的“使用索引”访问方法。为此，您需要一个覆盖索引，例如

... ON highscores (player, happened_in, score).

That's likely to be the most beneficial index (performance wise) for your existing view definition and your existing query. The playercolumn is the leading column because you have an equality predicate on that column in the view query. The happened_incolumn is next, because you've got a GROUP BY operation on that column, and MySQL is going to be able to use this index to optimize the GROUP BY operation. We also include the scorecolumn, because that is the only other column referenced in your query. That makes the index a "covering" index, because MySQL can satisfy that query directly from index pages, without a need to visit any pages in the underlying table. And that's as good as we're going to get out of that query plan: "Using index" with no "Using filesort".

对于现有视图定义和现有查询，这可能是最有益的索引（性能方面）。该player列是前导列，因为您在视图查询中的该列上有一个相等谓词。happened_in接下来是列，因为您对该列进行了 GROUP BY 操作，而 MySQL 将能够使用此索引来优化 GROUP BY 操作。我们还包括该score列，因为这是您的查询中唯一引用的其他列。这使得索引成为“覆盖”索引，因为 MySQL 可以直接从索引页面满足该查询，而无需访问基础表中的任何页面。这与我们将要退出查询计划一样好：“使用索引”而不是“使用文件排序”。

Compare performance to standalone query with no derived table

将性能与没有派生表的独立查询进行比较

You could compare the execution plan for your query against the view vs. an equivalent standalone query:

您可以将查询的执行计划与视图与等效的独立查询进行比较：

SELECT player
     , MAX(score) AS highest_score
     , happened_in
 FROM highscores
WHERE player = 24
  AND happened_in = 2006
GROUP
   BY player
    , happened_in

The standalone query can also make use of a covering index e.g.

独立查询也可以使用覆盖索引，例如

... ON highscores (player, happened_in, score)

but without a need to materialize an intermediate MyISAM table.

但无需实现中间 MyISAM 表。

I am not sure that any of the previous provides a direct answer to the question you were asking.