MySQL 什么时候应该使用复合索引？

Question

提问by Teddy

When should I use a composite index in a database?
What are the performance ramification by using a composite index)?
Why should I use use a composite index?

什么时候应该在数据库中使用复合索引？
使用复合索引对性能有什么影响）？
为什么要使用复合索引？

For example, I have a homestable:

例如，我有一个homes表：

CREATE TABLE IF NOT EXISTS `homes` (
  `home_id` int(10) unsigned NOT NULL auto_increment,
  `sqft` smallint(5) unsigned NOT NULL,
  `year_built` smallint(5) unsigned NOT NULL,
  `geolat` decimal(10,6) default NULL,
  `geolng` decimal(10,6) default NULL,
  PRIMARY KEY  (`home_id`),
  KEY `geolat` (`geolat`),
  KEY `geolng` (`geolng`),
) ENGINE=InnoDB  ;

Does it make sense for me to use a composite index for both geolatand geolng, such that:

是否有意义，我使用的复合指数都geolat和geolng，这样的：

I replace:

我替换：

  KEY `geolat` (`geolat`),
  KEY `geolng` (`geolng`),

with:

和：

KEY `geolat_geolng` (`geolat`, `geolng`)

If so:

如果是这样的话：

Why?
What is the performance ramification by using a composite index)?

为什么？
使用复合索引的性能后果是什么）？

UPDATE:

更新：

Since many people have stated it entirely dependent upon the queries I perform, below is the most common query performed:

由于许多人表示它完全依赖于我执行的查询，以下是最常见的查询：

SELECT * FROM homes
WHERE geolat BETWEEN ??? AND ???
AND geolng BETWEEN ??? AND ???

UPDATE 2:

更新 2：

With the following database schema:

使用以下数据库架构：

CREATE TABLE IF NOT EXISTS `homes` (
  `home_id` int(10) unsigned NOT NULL auto_increment,
  `primary_photo_group_id` int(10) unsigned NOT NULL default '0',
  `customer_id` bigint(20) unsigned NOT NULL,
  `account_type_id` int(11) NOT NULL,
  `address` varchar(128) collate utf8_unicode_ci NOT NULL,
  `city` varchar(64) collate utf8_unicode_ci NOT NULL,
  `state` varchar(2) collate utf8_unicode_ci NOT NULL,
  `zip` mediumint(8) unsigned NOT NULL,
  `price` mediumint(8) unsigned NOT NULL,
  `sqft` smallint(5) unsigned NOT NULL,
  `year_built` smallint(5) unsigned NOT NULL,
  `num_of_beds` tinyint(3) unsigned NOT NULL,
  `num_of_baths` decimal(3,1) unsigned NOT NULL,
  `num_of_floors` tinyint(3) unsigned NOT NULL,
  `description` text collate utf8_unicode_ci,
  `geolat` decimal(10,6) default NULL,
  `geolng` decimal(10,6) default NULL,
  `display_status` tinyint(1) NOT NULL,
  `date_listed` timestamp NOT NULL default CURRENT_TIMESTAMP,
  `contact_email` varchar(100) collate utf8_unicode_ci NOT NULL,
  `contact_phone_number` varchar(15) collate utf8_unicode_ci NOT NULL,
  PRIMARY KEY  (`home_id`),
  KEY `customer_id` (`customer_id`),
  KEY `city` (`city`),
  KEY `num_of_beds` (`num_of_beds`),
  KEY `num_of_baths` (`num_of_baths`),
  KEY `geolat` (`geolat`),
  KEY `geolng` (`geolng`),
  KEY `account_type_id` (`account_type_id`),
  KEY `display_status` (`display_status`),
  KEY `sqft` (`sqft`),
  KEY `price` (`price`),
  KEY `primary_photo_group_id` (`primary_photo_group_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=8 ;

Using the following SQL:

使用以下 SQL：

EXPLAIN SELECT  homes.home_id,
                    address,
                    city,
                    state,
                    zip,
                    price,
                    sqft,
                    year_built,
                    account_type_id,
                    num_of_beds,
                    num_of_baths,
                    geolat,
                    geolng,
                    photo_id,
                    photo_url_dir
            FROM homes
            LEFT OUTER JOIN home_photos ON homes.home_id = home_photos.home_id
                AND homes.primary_photo_group_id = home_photos.home_photo_group_id
                AND home_photos.home_photo_type_id = 2
            WHERE homes.display_status = true
            AND homes.geolat BETWEEN -100 AND 100
            AND homes.geolng BETWEEN -100 AND 100

EXPLAIN returns:

解释返回：

id  select_type  table        type  possible_keys                                    key                  key_len  ref     rows  Extra
----------------------------------------------------------------------------------------------------------
1   SIMPLE       homes        ref   geolat,geolng,display_status                     display_status       1        const   2     Using where
1  SIMPLE        home_photos  ref   home_id,home_photo_type_id,home_photo_group_id   home_photo_group_id  4        homes.primary_photo_group_id   4

I don't quite understand how to read the EXPLAIN command. Does this look good or bad. Right now, I am NOT using a composite index for geolat and geolng. Should I be?

我不太明白如何阅读 EXPLAIN 命令。这看起来是好是坏。现在，我没有为 geolat 和 geolng 使用复合索引。我可以做？

Answer 1

回答by Mark Canlas

You should use a composite index when you are using queries that benefit from it. A composite index that looks like this:

当您使用从中受益的查询时，您应该使用复合索引。如下所示的复合索引：

index( column_A, column_B, column_C )

will benefit a query that uses those fields for joining, filtering, and sometimes selecting. It will also benefit queries that use left-most subsets of columns in that composite. So the above index will also satisfy queries that need

将有利于使用这些字段进行连接、过滤和有时选择的查询。它还将有益于使用该组合中最左侧列子集的查询。所以上面的索引也会满足需要的查询

index( column_A, column_B, column_C )
index( column_A, column_B )
index( column_A )

But it will not (at least not directly, maybe it can help partially if there are no better indices) help for queries that need

但它不会（至少不是直接的，如果没有更好的索引，它可能会部分帮助）对需要的查询有帮助

index( column_A, column_C )

Notice how column_B is missing.

注意 column_B 是如何丢失的。

In your original example, a composite index for two dimensions will mostly benefit queries that query on both dimensions or the leftmost dimension by itself, but not the rightmost dimension by itself. If you're always querying two dimensions, a composite index is the way to go, doesn't really matter which is first (most probably).

在您的原始示例中，两个维度的复合索引最有利于查询两个维度或最左边的维度本身，而不是最右边的维度本身。如果您总是查询两个维度，那么复合索引是可行的方法，哪个首先（最有可能）并不重要。

Answer 2

回答by Emre Yazici

Imagine you have the following three queries:

假设您有以下三个查询：

Query I:

查询一：

SELECT * FROM homes WHERE `geolat`=42.9 AND `geolng`=36.4

Query II:

查询二：

SELECT * FROM homes WHERE `geolat`=42.9

Query III:

查询三：

SELECT * FROM homes WHERE `geolng`=36.4

If you have seperate index per column, all three queries use indexes. In MySQL, if you have composite index (geolat, geolng), only query I and query II (which is using the first part of the composit index) uses indexes. In this case, query III requires full table search.

如果每列都有单独的索引，则所有三个查询都使用索引。在 MySQL 中，如果您有复合索引 ( geolat, geolng)，则只有查询 I 和查询 II（使用复合索引的第一部分）使用索引。在这种情况下，查询 III 需要全表搜索。

On Multiple-Column Indexessection of manual, it is clearly explained how multiple column indexes work, so I don't want to retype manual.

在手册的多列索引部分，它清楚地解释了多列索引的工作原理，所以我不想重新键入手册。

From the MySQL Reference Manual page:

从MySQL 参考手册页：

A multiple-column index can be considered a sorted array containing values that are created by concatenating the values of the indexed columns.

可以将多列索引视为包含通过连接索引列的值创建的值的排序数组。

If you use seperated index for geolat and geolng columns, you have two different index in your table which you can search independent.

如果您对 geolat 和 geolng 列使用单独的索引，则您的表中有两个不同的索引，您可以独立搜索。

INDEX geolat
-----------
VALUE RRN
36.4  1
36.4  8
36.6  2
37.8  3
37.8  12
41.4  4

INDEX geolng
-----------
VALUE RRN
26.1  1
26.1  8
29.6  2
29.6  3
30.1  12
34.7  4

If you use composite index you have only one index for both columns:

如果您使用复合索引，则两列只有一个索引：

INDEX (geolat, geolng)
-----------
VALUE      RRN
36.4,26.1  1
36.4,26.1  8
36.6,29.6  2
37.8,29.6  3
37.8,30.1  12
41.4,34.7  4

RRN is relative record number (to simplify, you can say ID). The first two index generated seperate and the third index is composite. As you can see you can search based on geolng on composite one since it is indexed by geolat, however it's possible to search by geolat or "geolat AND geolng" (since geolng is second level index).

RRN 是相对记录编号（为简化起见，您可以说是 ID）。前两个索引是单独生成的，第三个索引是合成的。正如您所看到的，您可以基于 geolng 对复合索引进行搜索，因为它是由 geolat 索引的，但是可以通过 geolat 或“geolat AND geolng”进行搜索（因为 geolng 是二级索引）。

Also, have a look at How MySQL Uses Indexesmanual section.

另外，看看MySQL 如何使用索引手册部分。

Answer 3

回答by Question Overflow

There could be a misconception about what composite index does. Many people think that composite index can be used to optimise a search query as long as the whereclause covers the indexed columns, in your case geolatand geolng. Let's delve deeper:

可能对复合索引的作用存在误解。许多人认为，只要该where子句涵盖索引列，就可以使用复合索引来优化搜索查询，在您的情况下geolat和geolng. 让我们深入探讨：

I believe your data on the coordinates of homes would be random decimals as such:

我相信您关于房屋坐标的数据将是随机小数，例如：

home_id  geolat  geolng
   1    20.1243  50.4521
   2    22.6456  51.1564
   3    13.5464  45.4562
   4    55.5642 166.5756
   5    24.2624  27.4564
   6    62.1564  24.2542
...

Since geolatand geolngvalues hardly repeat itself. A composite index on geolatand geolngwould look something like this:

因为geolat和geolng值几乎不会重复。上的一个综合指数geolat，并geolng会是这个样子：

index_id  geolat  geolng
   1     20.1243  50.4521
   2     20.1244  61.1564
   3     20.1251  55.4562
   4     20.1293  66.5756
   5     20.1302  57.4564
   6     20.1311  54.2542
...

Therefore the second column of the composite index is basically useless! The speed of your query with a composite index is probably going to be similar to an index on just the geolatcolumn.

所以复合索引的第二列基本没用！使用复合索引的查询速度可能与仅geolat列上的索引相似。

As mentioned by Will, MySQL provides spatial extensionsupport. A spatial point is stored in a single column instead of two separate latlngcolumns. Spatial index can be applied to such a column. However, the efficiency could be overrated based on my personal experience. It could be that spatial index does not resolve the two dimensional problem but merely speed up the search using R-Trees with quadratic splitting.

正如 Will 所提到的，MySQL 提供了空间扩展支持。空间点存储在单个列中，而不是两个单独的latlng列中。空间索引可以应用于这样的列。但是，根据我的个人经验，效率可能被高估了。可能是空间索引不能解决二维问题，而只是使用带有二次分裂的 R-Trees加速搜索。

The trade-off is that a spatial point consumes much more memoryas it used eight-byte double-precision numbers for storing coordinates. Correct me if I am wrong.

权衡是空间点消耗更多内存，因为它使用八字节双精度数字来存储坐标。如果我错了，请纠正我。

Answer 4

回答by ProfileTwist

Composite indexes are very powerful as they:

复合索引非常强大，因为它们：

Enforce structure integrity
Enable sorting on a FILTERED id

加强结构完整性
对过滤的 id 启用排序

ENFORCE STRUCTURE INTEGRITY

加强结构完整性

Composite indexes are not just another type of index; they can provide NECESSARY structure to a table by enforcing integrity as the Primary Key.

复合索引不仅仅是另一种类型的索引；他们可以通过将完整性作为主键来为表提供必要的结构。

Mysql's Innodb supports clustering and the following example illustrates why a composite index may be necessary.

Mysql 的 Innodb 支持集群，下面的例子说明了为什么可能需要复合索引。

To create a friends' tables (i.e. for a social network) we need 2 columns: user_id, friend_id.

要创建朋友表（即用于社交网络），我们需要 2 列：user_id, friend_id.

Table Strcture

表结构

user_id (medium_int)
friend_id (medium_int)

Primary Key -> (user_id, friend_id)

By virtue, a Primary Key (PK) is unique and by creating a composite PK, Innodb will automatically check that no duplicates on user_id, friend_idexists when a new record is added. This is the expected behavior as no user should have more than 1 record (relationship link) with friend_id = 2for instance.

由于主键 (PK) 是唯一的，并且通过创建复合 PK，Innodb 将在user_id, friend_id添加新记录时自动检查是否存在重复项。这是预期的行为，因为没有用户应该拥有超过 1 条记录（关系链接）friend_id = 2。

Without a composite PK, we can create this schema using a surrogate key:

如果没有复合 PK，我们可以使用代理键创建此模式：

user_friend_id
user_id
friend_id

Primary Key -> (user_friend_id)

Now, whenever a new record is added we will have to check that a prior record with the combination user_id, friend_iddoes not already exist.

现在，每当添加新记录时，我们都必须检查具有该组合的先前记录user_id, friend_id是否已经存在。

As such, a composite index can enforce structure integrity.

因此，复合索引可以加强结构完整性。

ENABLE SORTING ON A FILTERED ID

启用对过滤后的 ID 进行排序

It is very common to sort a set of records by the post's time (timestamp or datetime). Usually, this means posting on a given id. Here is an example

按帖子的时间（时间戳或日期时间）对一组记录进行排序是很常见的。通常，这意味着在给定的 id 上发布。这是一个例子

Table User_Wall_Posts (think if Facebook's wall posts)

表 User_Wall_Posts（想想 Facebook 的墙贴）

user_id (medium_int)
timestamp (timestamp)
author_id (medium_int)
comment_post (text)

Primary Key -> (user_id, timestamp, author_id)

We want to query and find all posts for user_id = 10and sort the comment posts by timestamp(date).

我们想查询和查找所有帖子，user_id = 10并按timestamp（日期）对评论帖子进行排序。

SQL QUERY

SQL查询

SELECT * FROM User_Wall_Posts WHERE user_id = 10 ORDER BY timestamp DES

The composite PK enables Mysql to filter and sort the results using the index; Mysql will not have to use a temporary file or filesort to fetch the results. Without a composite key, this would not be possible and would cause a very inefficient query.

复合PK使Mysql能够使用索引对结果进行过滤和排序；Mysql 不必使用临时文件或文件排序来获取结果。如果没有组合键，这将是不可能的，并且会导致非常低效的查询。

As such, composite keys are very powerful and suit more than the simple problem of "I want to search for column_a, column_bso I will use composite keys. For my current database schema, I have just as many composite keys as single keys. Don't overlook a composite key's use!

因此，复合键非常强大，不仅适用于“我想搜索，column_a, column_b所以我将使用复合键”这样的简单问题。对于我当前的数据库模式，我拥有与单键一样多的复合键。不要忽视复合键的使用！

Answer 5

回答by Rick James

Composite indexesare useful for

复合索引可用于

0 or more "=" clauses, plus
at most onerange clause.

0 个或多个“=”子句，加上
最多一个范围子句。

A composite index cannot handle tworanges. I discuss this further in my index cookbook.

复合索引不能处理两个范围。我在我的索引手册中进一步讨论了这一点。

Find nearest-- If the question is reallyabout optimizing

查找最近的——如果问题真的是关于优化

WHERE geolat BETWEEN ??? AND ???
  AND geolng BETWEEN ??? AND ???

then noindex can really handle both dimensions.

那么没有索引可以真正处理这两个维度。

Instead, one must 'think out of the box'. If one dimension is implemented via partitioning and the other is implemented by carefully picking the PRIMARY KEY, one can get significantly better efficiency for very large tables of lat/lng lookup. My latlng bloggoes into the details of how to implement "find nearest" on the globe. It includes code.

相反，人们必须“跳出框框思考”。如果一个维度是通过分区实现的，而另一个维度是通过仔细选择来实现的PRIMARY KEY，那么对于非常大的 lat/lng 查找表，可以显着提高效率。我的latlng 博客详细介绍了如何在地球上实现“查找最近的”。它包括代码。

The PARTITIONsare stripes of latitude ranges. The PRIMARY KEYdeliberately starts with longitude so that the useful rows are likely to be in the same block. A Stored Routine orchestrates the messy code for doing order by... limit...and for growing the 'square' around the target until you have enough coffee shops (or whatever). It also takes care of the great-circle calculations and handling the dateline and poles.

该PARTITIONs是纬度范围的条纹。该PRIMARY KEY故意经度启动，让有用的行很可能是在同一个块。存储例程编排了杂乱的代码order by... limit...，用于在目标周围做和扩大“正方形”，直到你有足够的咖啡店（或其他）。它还负责大圆计算和处理日期变更线和极点。

More

更多的

I have written another blog; it compares 5 ways of doing lat/lng searches: http://mysql.rjweb.org/doc.php/latlng#representation_choices(It references the link given above as one of the 5.) One of the other ways is this, and it points out that they are optimal for the particular case:

我又写了一篇博客；它比较了进行 lat/lng 搜索的 5 种方法：http: //mysql.rjweb.org/doc.php/latlng#representation_choices（它引用上面给出的链接作为5 种方法之一。）另一种方法是这样，并指出它们对于特定情况是最佳的：

INDEX(geolat, geolng),
INDEX(geolng, geolat)

That is, having both columns in two indexes, and nothaving single-column indexes on geolat and geolng is important.

也就是说，在两个索引中拥有两列，并且在 geolat 和 geolng 上没有单列索引很重要。

Answer 6

回答by Alexander

Composite index can be useful when you want to optimise group byclause (check this article http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html). Please pay attention:

当您想要优化group by子句时，复合索引很有用（请查看这篇文章http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html）。请注意：

The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index, and that the index stores its keys in order (for example, this is a BTREE index and not a HASH index)

使用 GROUP BY 索引的最重要前提是所有 GROUP BY 列引用来自同一索引的属性，并且索引按顺序存储其键（例如，这是一个 BTREE 索引而不是 HASH 索引）

Answer 7

回答by Will

To do spacial searches, you need an R-Treealgorithm, which allows searching geographical areas very quickly. Exactly what you need for this job.

要进行空间搜索，您需要一个R-Tree算法，它可以非常快速地搜索地理区域。正是你需要的这份工作。

Some databases have spacial indexes built in. A quick Google search shows MySQL 5 has them (which looking at your SQL I'm guessing you're using MySQL).

一些数据库内置了空间索引。快速的谷歌搜索显示 MySQL 5 有它们（看看你的 SQL，我猜你正在使用 MySQL）。

Answer 8

回答by Mitch Wheat

There is no Black and White, one size fits all answer.

没有黑与白，一刀切的答案。

You should use a composite index, when your query work load would benefit from one.

当您的查询工作负载将从中受益时，您应该使用复合索引。

You need to profile your query work load in order to determine this.

您需要分析您的查询工作负载以确定这一点。

A composite index comes into play when queries can be satisfied entirely from that index.

当查询可以完全从该索引中得到满足时，复合索引就会发挥作用。

UPDATE (in response to edit to posted question): If you are selecting * from the table the composite index may be used, it may not. You will need to run EXPLAIN PLANto be sure.

更新（响应对已发布问题的编辑）：如果您从表中选择 *，则可能会使用复合索引，但可能不会。您需要运行EXPLAIN PLAN才能确定。

Answer 9

回答by Jim Ferrans

I'm with @Mitch, depends entirely your queries. Fortunately you can create and drop indexes at any time, and you can prepend the EXPLAIN keyword to your queries to see if the query analyzer uses the indexes.

我和@Mitch 在一起，完全取决于您的查询。幸运的是，您可以随时创建和删除索引，并且可以在查询前添加 EXPLAIN 关键字以查看查询分析器是否使用索引。

If you'll be looking up an exactlat/long pair this index would likely make sense. But you're probably going to be looking for homes within a certain distance of a particular place, so your queries will look something like this (see source):

如果您要查找精确的纬度/经度对，则此索引可能有意义。但是您可能会在特定地点的一定距离内寻找房屋，因此您的查询将如下所示（请参阅源代码）：

select *, sqrt(  pow(h2.geolat - h1.geolat,  2) 
               + pow(h2.geolng - h1.geolng, 2) ) as distance
from homes h1, homes h2
where h1.home_id = 12345 and h2.home_id != h1.home_id
order by distance

and the index very likely won't be helpful at all. For geospatial queries, you need something like this.

并且该索引很可能根本没有帮助。对于地理空间查询，你需要像这样。

Update: with this query:

更新：使用此查询：

SELECT * FROM homes
WHERE geolat BETWEEN ??? AND ???
AND geolng BETWEEN ??? AND ???

The query analyzer could use an index on geolat alone, or an index on geolng alone, or possibly both indexes. I don't think it would use a composite index. But it's easy to try out each of these permutations on a real data set and then (a) see what EXPLAIN tells you and (b) measure the time the query really takes.

查询分析器可以单独使用 geolat 上的索引，或者单独使用 geolng 上的索引，或者可能同时使用这两种索引。我认为它不会使用复合索引。但是很容易在真实数据集上尝试这些排列中的每一个，然后 (a) 看看 EXPLAIN 告诉你什么和 (b) 测量查询真正花费的时间。

MySQL 什么时候应该使用复合索引？

提问by Teddy

回答by Mark Canlas

回答by Emre Yazici

回答by Question Overflow

回答by ProfileTwist

回答by Rick James

回答by Alexander

回答by Will

回答by Mitch Wheat

回答by Jim Ferrans

相关推荐

最近更新

标签

MySQL 什么时候应该使用复合索引？

提问by Teddy

回答by Mark Canlas

回答by Emre Yazici

回答by Question Overflow

回答by ProfileTwist

回答by Rick James

回答by Alexander

回答by Will

回答by Mitch Wheat

回答by Jim Ferrans

相关推荐

MySQL 如何修复“为列调整的超出范围值”错误？

mysql 选择 30 天范围内的日期

在 MySQL 中的列上查找具有相同值的行

mysql 中的 varchar(100) 声明使用了多少实际存储空间？

相关推荐

最近更新

标签