MySQL 什么时候应该使用复合索引?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1823685/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
When should I use a composite index?
提问by Teddy
- When should I use a composite index in a database?
- What are the performance ramification by using a composite index)?
- Why should I use use a composite index?
- 什么时候应该在数据库中使用复合索引?
- 使用复合索引对性能有什么影响)?
- 为什么要使用复合索引?
For example, I have a homes
table:
例如,我有一个homes
表:
CREATE TABLE IF NOT EXISTS `homes` (
`home_id` int(10) unsigned NOT NULL auto_increment,
`sqft` smallint(5) unsigned NOT NULL,
`year_built` smallint(5) unsigned NOT NULL,
`geolat` decimal(10,6) default NULL,
`geolng` decimal(10,6) default NULL,
PRIMARY KEY (`home_id`),
KEY `geolat` (`geolat`),
KEY `geolng` (`geolng`),
) ENGINE=InnoDB ;
Does it make sense for me to use a composite index for both geolat
and geolng
, such that:
是否有意义,我使用的复合指数都geolat
和geolng
,这样的:
I replace:
我替换:
KEY `geolat` (`geolat`),
KEY `geolng` (`geolng`),
with:
和:
KEY `geolat_geolng` (`geolat`, `geolng`)
If so:
如果是这样的话:
- Why?
- What is the performance ramification by using a composite index)?
- 为什么?
- 使用复合索引的性能后果是什么)?
UPDATE:
更新:
Since many people have stated it entirely dependent upon the queries I perform, below is the most common query performed:
由于许多人表示它完全依赖于我执行的查询,以下是最常见的查询:
SELECT * FROM homes
WHERE geolat BETWEEN ??? AND ???
AND geolng BETWEEN ??? AND ???
UPDATE 2:
更新 2:
With the following database schema:
使用以下数据库架构:
CREATE TABLE IF NOT EXISTS `homes` (
`home_id` int(10) unsigned NOT NULL auto_increment,
`primary_photo_group_id` int(10) unsigned NOT NULL default '0',
`customer_id` bigint(20) unsigned NOT NULL,
`account_type_id` int(11) NOT NULL,
`address` varchar(128) collate utf8_unicode_ci NOT NULL,
`city` varchar(64) collate utf8_unicode_ci NOT NULL,
`state` varchar(2) collate utf8_unicode_ci NOT NULL,
`zip` mediumint(8) unsigned NOT NULL,
`price` mediumint(8) unsigned NOT NULL,
`sqft` smallint(5) unsigned NOT NULL,
`year_built` smallint(5) unsigned NOT NULL,
`num_of_beds` tinyint(3) unsigned NOT NULL,
`num_of_baths` decimal(3,1) unsigned NOT NULL,
`num_of_floors` tinyint(3) unsigned NOT NULL,
`description` text collate utf8_unicode_ci,
`geolat` decimal(10,6) default NULL,
`geolng` decimal(10,6) default NULL,
`display_status` tinyint(1) NOT NULL,
`date_listed` timestamp NOT NULL default CURRENT_TIMESTAMP,
`contact_email` varchar(100) collate utf8_unicode_ci NOT NULL,
`contact_phone_number` varchar(15) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`home_id`),
KEY `customer_id` (`customer_id`),
KEY `city` (`city`),
KEY `num_of_beds` (`num_of_beds`),
KEY `num_of_baths` (`num_of_baths`),
KEY `geolat` (`geolat`),
KEY `geolng` (`geolng`),
KEY `account_type_id` (`account_type_id`),
KEY `display_status` (`display_status`),
KEY `sqft` (`sqft`),
KEY `price` (`price`),
KEY `primary_photo_group_id` (`primary_photo_group_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=8 ;
Using the following SQL:
使用以下 SQL:
EXPLAIN SELECT homes.home_id,
address,
city,
state,
zip,
price,
sqft,
year_built,
account_type_id,
num_of_beds,
num_of_baths,
geolat,
geolng,
photo_id,
photo_url_dir
FROM homes
LEFT OUTER JOIN home_photos ON homes.home_id = home_photos.home_id
AND homes.primary_photo_group_id = home_photos.home_photo_group_id
AND home_photos.home_photo_type_id = 2
WHERE homes.display_status = true
AND homes.geolat BETWEEN -100 AND 100
AND homes.geolng BETWEEN -100 AND 100
EXPLAIN returns:
解释返回:
id select_type table type possible_keys key key_len ref rows Extra
----------------------------------------------------------------------------------------------------------
1 SIMPLE homes ref geolat,geolng,display_status display_status 1 const 2 Using where
1 SIMPLE home_photos ref home_id,home_photo_type_id,home_photo_group_id home_photo_group_id 4 homes.primary_photo_group_id 4
I don't quite understand how to read the EXPLAIN command. Does this look good or bad. Right now, I am NOT using a composite index for geolat and geolng. Should I be?
我不太明白如何阅读 EXPLAIN 命令。这看起来是好是坏。现在,我没有为 geolat 和 geolng 使用复合索引。我可以做?
回答by Mark Canlas
You should use a composite index when you are using queries that benefit from it. A composite index that looks like this:
当您使用从中受益的查询时,您应该使用复合索引。如下所示的复合索引:
index( column_A, column_B, column_C )
will benefit a query that uses those fields for joining, filtering, and sometimes selecting. It will also benefit queries that use left-most subsets of columns in that composite. So the above index will also satisfy queries that need
将有利于使用这些字段进行连接、过滤和有时选择的查询。它还将有益于使用该组合中最左侧列子集的查询。所以上面的索引也会满足需要的查询
index( column_A, column_B, column_C )
index( column_A, column_B )
index( column_A )
But it will not (at least not directly, maybe it can help partially if there are no better indices) help for queries that need
但它不会(至少不是直接的,如果没有更好的索引,它可能会部分帮助)对需要的查询有帮助
index( column_A, column_C )
Notice how column_B is missing.
注意 column_B 是如何丢失的。
In your original example, a composite index for two dimensions will mostly benefit queries that query on both dimensions or the leftmost dimension by itself, but not the rightmost dimension by itself. If you're always querying two dimensions, a composite index is the way to go, doesn't really matter which is first (most probably).
在您的原始示例中,两个维度的复合索引最有利于查询两个维度或最左边的维度本身,而不是最右边的维度本身。如果您总是查询两个维度,那么复合索引是可行的方法,哪个首先(最有可能)并不重要。
回答by Emre Yazici
Imagine you have the following three queries:
假设您有以下三个查询:
Query I:
查询一:
SELECT * FROM homes WHERE `geolat`=42.9 AND `geolng`=36.4
Query II:
查询二:
SELECT * FROM homes WHERE `geolat`=42.9
Query III:
查询三:
SELECT * FROM homes WHERE `geolng`=36.4
If you have seperate index per column, all three queries use indexes. In MySQL, if you have composite index (geolat
, geolng
), only query I and query II (which is using the first part of the composit index) uses indexes. In this case, query III requires full table search.
如果每列都有单独的索引,则所有三个查询都使用索引。在 MySQL 中,如果您有复合索引 ( geolat
, geolng
),则只有查询 I 和查询 II(使用复合索引的第一部分)使用索引。在这种情况下,查询 III 需要全表搜索。
On Multiple-Column Indexessection of manual, it is clearly explained how multiple column indexes work, so I don't want to retype manual.
在手册的多列索引部分,它清楚地解释了多列索引的工作原理,所以我不想重新键入手册。
From the MySQL Reference Manual page:
A multiple-column index can be considered a sorted array containing values that are created by concatenating the values of the indexed columns.
可以将多列索引视为包含通过连接索引列的值创建的值的排序数组。
If you use seperated index for geolat and geolng columns, you have two different index in your table which you can search independent.
如果您对 geolat 和 geolng 列使用单独的索引,则您的表中有两个不同的索引,您可以独立搜索。
INDEX geolat
-----------
VALUE RRN
36.4 1
36.4 8
36.6 2
37.8 3
37.8 12
41.4 4
INDEX geolng
-----------
VALUE RRN
26.1 1
26.1 8
29.6 2
29.6 3
30.1 12
34.7 4
If you use composite index you have only one index for both columns:
如果您使用复合索引,则两列只有一个索引:
INDEX (geolat, geolng)
-----------
VALUE RRN
36.4,26.1 1
36.4,26.1 8
36.6,29.6 2
37.8,29.6 3
37.8,30.1 12
41.4,34.7 4
RRN is relative record number (to simplify, you can say ID). The first two index generated seperate and the third index is composite. As you can see you can search based on geolng on composite one since it is indexed by geolat, however it's possible to search by geolat or "geolat AND geolng" (since geolng is second level index).
RRN 是相对记录编号(为简化起见,您可以说是 ID)。前两个索引是单独生成的,第三个索引是合成的。正如您所看到的,您可以基于 geolng 对复合索引进行搜索,因为它是由 geolat 索引的,但是可以通过 geolat 或“geolat AND geolng”进行搜索(因为 geolng 是二级索引)。
Also, have a look at How MySQL Uses Indexesmanual section.
另外,看看MySQL 如何使用索引手册部分。
回答by Question Overflow
There could be a misconception about what composite index does. Many people think that composite index can be used to optimise a search query as long as the where
clause covers the indexed columns, in your case geolat
and geolng
. Let's delve deeper:
可能对复合索引的作用存在误解。许多人认为,只要该where
子句涵盖索引列,就可以使用复合索引来优化搜索查询,在您的情况下geolat
和geolng
. 让我们深入探讨:
I believe your data on the coordinates of homes would be random decimals as such:
我相信您关于房屋坐标的数据将是随机小数,例如:
home_id geolat geolng
1 20.1243 50.4521
2 22.6456 51.1564
3 13.5464 45.4562
4 55.5642 166.5756
5 24.2624 27.4564
6 62.1564 24.2542
...
Since geolat
and geolng
values hardly repeat itself. A composite index on geolat
and geolng
would look something like this:
因为geolat
和geolng
值几乎不会重复。上的一个综合指数geolat
,并geolng
会是这个样子:
index_id geolat geolng
1 20.1243 50.4521
2 20.1244 61.1564
3 20.1251 55.4562
4 20.1293 66.5756
5 20.1302 57.4564
6 20.1311 54.2542
...
Therefore the second column of the composite index is basically useless! The speed of your query with a composite index is probably going to be similar to an index on just the geolat
column.
所以复合索引的第二列基本没用!使用复合索引的查询速度可能与仅geolat
列上的索引相似。
As mentioned by Will, MySQL provides spatial extensionsupport. A spatial point is stored in a single column instead of two separate lat
lng
columns. Spatial index can be applied to such a column. However, the efficiency could be overrated based on my personal experience. It could be that spatial index does not resolve the two dimensional problem but merely speed up the search using R-Trees with quadratic splitting.
正如 Will 所提到的,MySQL 提供了空间扩展支持。空间点存储在单个列中,而不是两个单独的lat
lng
列中。空间索引可以应用于这样的列。但是,根据我的个人经验,效率可能被高估了。可能是空间索引不能解决二维问题,而只是使用带有二次分裂的 R-Trees加速搜索。
The trade-off is that a spatial point consumes much more memoryas it used eight-byte double-precision numbers for storing coordinates. Correct me if I am wrong.
权衡是空间点消耗更多内存,因为它使用八字节双精度数字来存储坐标。如果我错了,请纠正我。
回答by ProfileTwist
Composite indexes are very powerful as they:
复合索引非常强大,因为它们:
- Enforce structure integrity
- Enable sorting on a FILTERED id
- 加强结构完整性
- 对过滤的 id 启用排序
ENFORCE STRUCTURE INTEGRITY
加强结构完整性
Composite indexes are not just another type of index; they can provide NECESSARY structure to a table by enforcing integrity as the Primary Key.
复合索引不仅仅是另一种类型的索引;他们可以通过将完整性作为主键来为表提供必要的结构。
Mysql's Innodb supports clustering and the following example illustrates why a composite index may be necessary.
Mysql 的 Innodb 支持集群,下面的例子说明了为什么可能需要复合索引。
To create a friends' tables (i.e. for a social network) we need 2 columns: user_id, friend_id
.
要创建朋友表(即用于社交网络),我们需要 2 列:user_id, friend_id
.
Table Strcture
表结构
user_id (medium_int)
friend_id (medium_int)
Primary Key -> (user_id, friend_id)
By virtue, a Primary Key (PK) is unique and by creating a composite PK, Innodb will automatically check that no duplicates on user_id, friend_id
exists when a new record is added. This is the expected behavior as no user should have more than 1 record (relationship link) with friend_id = 2
for instance.
由于主键 (PK) 是唯一的,并且通过创建复合 PK,Innodb 将在user_id, friend_id
添加新记录时自动检查是否存在重复项。这是预期的行为,因为没有用户应该拥有超过 1 条记录(关系链接)friend_id = 2
。
Without a composite PK, we can create this schema using a surrogate key:
如果没有复合 PK,我们可以使用代理键创建此模式:
user_friend_id
user_id
friend_id
Primary Key -> (user_friend_id)
Now, whenever a new record is added we will have to check that a prior record with the combination user_id, friend_id
does not already exist.
现在,每当添加新记录时,我们都必须检查具有该组合的先前记录user_id, friend_id
是否已经存在。
As such, a composite index can enforce structure integrity.
因此,复合索引可以加强结构完整性。
ENABLE SORTING ON A FILTERED ID
启用对过滤后的 ID 进行排序
It is very common to sort a set of records by the post's time (timestamp or datetime). Usually, this means posting on a given id. Here is an example
按帖子的时间(时间戳或日期时间)对一组记录进行排序是很常见的。通常,这意味着在给定的 id 上发布。这是一个例子
Table User_Wall_Posts (think if Facebook's wall posts)
表 User_Wall_Posts(想想 Facebook 的墙贴)
user_id (medium_int)
timestamp (timestamp)
author_id (medium_int)
comment_post (text)
Primary Key -> (user_id, timestamp, author_id)
We want to query and find all posts for user_id = 10
and sort the comment posts by timestamp
(date).
我们想查询和查找所有帖子,user_id = 10
并按timestamp
(日期)对评论帖子进行排序。
SQL QUERY
SQL查询
SELECT * FROM User_Wall_Posts WHERE user_id = 10 ORDER BY timestamp DES
The composite PK enables Mysql to filter and sort the results using the index; Mysql will not have to use a temporary file or filesort to fetch the results. Without a composite key, this would not be possible and would cause a very inefficient query.
复合PK使Mysql能够使用索引对结果进行过滤和排序;Mysql 不必使用临时文件或文件排序来获取结果。如果没有组合键,这将是不可能的,并且会导致非常低效的查询。
As such, composite keys are very powerful and suit more than the simple problem of "I want to search for column_a, column_b
so I will use composite keys. For my current database schema, I have just as many composite keys as single keys. Don't overlook a composite key's use!
因此,复合键非常强大,不仅适用于“我想搜索,column_a, column_b
所以我将使用复合键”这样的简单问题。对于我当前的数据库模式,我拥有与单键一样多的复合键。不要忽视复合键的使用!
回答by Rick James
Composite indexesare useful for
复合索引可用于
- 0 or more "=" clauses, plus
- at most onerange clause.
- 0 个或多个“=”子句,加上
- 最多一个范围子句。
A composite index cannot handle tworanges. I discuss this further in my index cookbook.
复合索引不能处理两个范围。我在我的索引手册中进一步讨论了这一点。
Find nearest-- If the question is reallyabout optimizing
查找最近的——如果问题真的是关于优化
WHERE geolat BETWEEN ??? AND ???
AND geolng BETWEEN ??? AND ???
then noindex can really handle both dimensions.
那么没有索引可以真正处理这两个维度。
Instead, one must 'think out of the box'. If one dimension is implemented via partitioning and the other is implemented by carefully picking the PRIMARY KEY
, one can get significantly better efficiency for very large tables of lat/lng lookup. My latlng bloggoes into the details of how to implement "find nearest" on the globe. It includes code.
相反,人们必须“跳出框框思考”。如果一个维度是通过分区实现的,而另一个维度是通过仔细选择 来实现的PRIMARY KEY
,那么对于非常大的 lat/lng 查找表,可以显着提高效率。我的latlng 博客详细介绍了如何在地球上实现“查找最近的”。它包括代码。
The PARTITIONs
are stripes of latitude ranges. The PRIMARY KEY
deliberately starts with longitude so that the useful rows are likely to be in the same block. A Stored Routine orchestrates the messy code for doing order by... limit...
and for growing the 'square' around the target until you have enough coffee shops (or whatever). It also takes care of the great-circle calculations and handling the dateline and poles.
该PARTITIONs
是纬度范围的条纹。该PRIMARY KEY
故意经度启动,让有用的行很可能是在同一个块。存储例程编排了杂乱的代码order by... limit...
,用于在目标周围做和扩大“正方形”,直到你有足够的咖啡店(或其他)。它还负责大圆计算和处理日期变更线和极点。
More
更多的
I have written another blog; it compares 5 ways of doing lat/lng searches: http://mysql.rjweb.org/doc.php/latlng#representation_choices(It references the link given above as one of the 5.) One of the other ways is this, and it points out that they are optimal for the particular case:
我又写了一篇博客;它比较了进行 lat/lng 搜索的 5 种方法:http: //mysql.rjweb.org/doc.php/latlng#representation_choices(它引用上面给出的链接作为5 种方法 之一。)另一种方法是这样,并指出它们对于特定情况是最佳的:
INDEX(geolat, geolng),
INDEX(geolng, geolat)
That is, having both columns in two indexes, and nothaving single-column indexes on geolat and geolng is important.
也就是说,在两个索引中拥有两列,并且在 geolat 和 geolng 上没有单列索引很重要。
回答by Alexander
Composite index can be useful when you want to optimise group by
clause (check this article http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html).
Please pay attention:
当您想要优化group by
子句时,复合索引很有用(请查看这篇文章http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html)。请注意:
The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index, and that the index stores its keys in order (for example, this is a BTREE index and not a HASH index)
使用 GROUP BY 索引的最重要前提是所有 GROUP BY 列引用来自同一索引的属性,并且索引按顺序存储其键(例如,这是一个 BTREE 索引而不是 HASH 索引)
回答by Will
To do spacial searches, you need an R-Treealgorithm, which allows searching geographical areas very quickly. Exactly what you need for this job.
要进行空间搜索,您需要一个R-Tree算法,它可以非常快速地搜索地理区域。正是你需要的这份工作。
Some databases have spacial indexes built in. A quick Google search shows MySQL 5 has them (which looking at your SQL I'm guessing you're using MySQL).
一些数据库内置了空间索引。快速的谷歌搜索显示 MySQL 5 有它们(看看你的 SQL,我猜你正在使用 MySQL)。
回答by Mitch Wheat
There is no Black and White, one size fits all answer.
没有黑与白,一刀切的答案。
You should use a composite index, when your query work load would benefit from one.
当您的查询工作负载将从中受益时,您应该使用复合索引。
You need to profile your query work load in order to determine this.
您需要分析您的查询工作负载以确定这一点。
A composite index comes into play when queries can be satisfied entirely from that index.
当查询可以完全从该索引中得到满足时,复合索引就会发挥作用。
UPDATE (in response to edit to posted question): If you are selecting * from the table the composite index may be used, it may not. You will need to run EXPLAIN PLANto be sure.
更新(响应对已发布问题的编辑):如果您从表中选择 *,则可能会使用复合索引,但可能不会。您需要运行EXPLAIN PLAN才能确定。
回答by Jim Ferrans
I'm with @Mitch, depends entirely your queries. Fortunately you can create and drop indexes at any time, and you can prepend the EXPLAIN keyword to your queries to see if the query analyzer uses the indexes.
我和@Mitch 在一起,完全取决于您的查询。幸运的是,您可以随时创建和删除索引,并且可以在查询前添加 EXPLAIN 关键字以查看查询分析器是否使用索引。
If you'll be looking up an exactlat/long pair this index would likely make sense. But you're probably going to be looking for homes within a certain distance of a particular place, so your queries will look something like this (see source):
如果您要查找精确的纬度/经度对,则此索引可能有意义。但是您可能会在特定地点的一定距离内寻找房屋,因此您的查询将如下所示(请参阅源代码):
select *, sqrt( pow(h2.geolat - h1.geolat, 2)
+ pow(h2.geolng - h1.geolng, 2) ) as distance
from homes h1, homes h2
where h1.home_id = 12345 and h2.home_id != h1.home_id
order by distance
and the index very likely won't be helpful at all. For geospatial queries, you need something like this.
并且该索引很可能根本没有帮助。对于地理空间查询,你需要像这样。
Update: with this query:
更新:使用此查询:
SELECT * FROM homes
WHERE geolat BETWEEN ??? AND ???
AND geolng BETWEEN ??? AND ???
The query analyzer could use an index on geolat alone, or an index on geolng alone, or possibly both indexes. I don't think it would use a composite index. But it's easy to try out each of these permutations on a real data set and then (a) see what EXPLAIN tells you and (b) measure the time the query really takes.
查询分析器可以单独使用 geolat 上的索引,或者单独使用 geolng 上的索引,或者可能同时使用这两种索引。我认为它不会使用复合索引。但是很容易在真实数据集上尝试这些排列中的每一个,然后 (a) 看看 EXPLAIN 告诉你什么和 (b) 测量查询真正花费的时间。