MySQL 索引如何工作?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3567981/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 16:57:46  来源:igfitidea点击:

How do MySQL indexes work?

mysqlindexing

提问by good_evening

I am really interested in how MySQL indexes work, more specifically, how can they return the data requested without scanning the entire table?

我真的对 MySQL 索引的工作方式很感兴趣,更具体地说,它们如何在不扫描整个表的情况下返回请求的数据?

It's off-topic, I know, but if there is someone who could explain this to me in detail, I would be very, very thankful.

这是题外话,我知道,但如果有人可以向我详细解释这一点,我将非常非常感谢。

回答by Piskvor left the building

Basically an index on a table works like an index in a book (that's where the name came from):

基本上,表上的索引就像书中的索引一样(这就是名称的来源):

Let's say you have a book about databases and you want to find some information about, say, storage. Without an index (assuming no other aid, such as a table of contents) you'd have to go through the pages one by one, until you found the topic (that's a full table scan). On the other hand, an index has a list of keywords, so you'd consult the index and see that storageis mentioned on pages 113-120,231 and 354. Then you could flip to those pages directly, without searching (that's a search with an index, somewhat faster).

假设您有一本关于数据库的书,并且您想找到一些关于存储的信息。如果没有索引(假设没有其他帮助,例如目录),您必须一页一页地浏览页面,直到找到主题(即full table scan)。另一方面,索引有一个关键字列表,因此您可以查阅索引并看到storage第 113-120,231 和 354 页中提到的内容。然后您可以直接翻到这些页面,而无需搜索(即使用索引,稍微快一些)。

Of course, how useful the index will be, depends on many things - a few examples, using the simile above:

当然,索引的有用程度取决于很多事情 - 一些例子,使用上面的明喻:

  • if you had a book on databases and indexed the word "database", you'd see that it's mentioned on pages 1-59,61-290, and 292 to 400. In such case, the index is not much help and it might be faster to go through the pages one by one (in a database, this is "poor selectivity").
  • For a 10-page book, it makes no sense to make an index, as you may end up with a 10-page book prefixed by a 5-page index, which is just silly - just scan the 10 pages and be done with it.
  • The index also needs to be useful - there's generally no point to index e.g. the frequency of the letter "L" per page.
  • 如果你有一本关于数据库的书并且索引了“数据库”这个词,你会看到它在第 1-59、61-290 和 292 到 400 页中提到。在这种情况下,索引没有多大帮助,它可能一页一页地浏览页面会更快(在数据库中,这是“选择性差”)。
  • 对于一本 10 页的书,制作索引是没有意义的,因为您最终可能会得到一本以 5 页索引为前缀的 10 页书,这很愚蠢 - 只需扫描 10 页即可完成.
  • 索引也需要有用——通常没有必要索引,例如每页字母“L”的频率。

回答by clarete

The first thing you must know is that indexes are a way to avoid scanning the full table to obtain the result that you're looking for.

您必须知道的第一件事是,索引是一种避免扫描整个表以获得您要查找的结果的方法。

There are different kinds of indexes and they're implemented in the storage layer, so there's no standard between them and they also depend on the storage engine that you're using.

有不同种类的索引,它们是在存储层中实现的,因此它们之间没有标准,它们还取决于您使用的存储引擎。

InnoDB and the B+Tree index

InnoDB 和 B+Tree 索引

For InnoDB, the most common index type is the B+Tree based index, that stores the elements in a sorted order. Also, you don't have to access the real table to get the indexed values, which makes your query return way faster.

对于 InnoDB,最常见的索引类型是基于 B+Tree 的索引,它按排序顺序存储元素。此外,您不必访问真实表来获取索引值,这使您的查询返回速度更快。

The "problem" about this index type is that you have to query for the leftmost value to use the index. So, if your index has two columns, say last_name and first_name, the order that you query these fields matters a lot.

这种索引类型的“问题”是你必须查询最左边的值才能使用索引。因此,如果您的索引有两列,比如 last_name 和 first_name,那么您查询这些字段的顺序非常重要

So, given the following table:

所以,给定下表:

CREATE TABLE person (
    last_name VARCHAR(50) NOT NULL,
    first_name VARCHAR(50) NOT NULL,
    INDEX (last_name, first_name)
);

This query would take advantage of the index:

此查询将利用索引:

SELECT last_name, first_name FROM person
WHERE last_name = "John" AND first_name LIKE "J%"

But the following one would not

但是下面的就不会

SELECT last_name, first_name FROM person WHERE first_name = "Constantine"

Because you're querying the first_namecolumn first and it's not the leftmost column in the index.

因为您first_name首先查询该列并且它不是索引中最左边的列。

This last example is even worse:

最后一个例子更糟糕:

SELECT last_name, first_name FROM person WHERE first_name LIKE "%Constantine"

Because now, you're comparing the rightmost part of the rightmost field in the index.

因为现在,您正在比较索引中最右侧字段的最右侧部分。

The hash index

哈希索引

This is a different index type that unfortunately, only the memory backend supports. It's lightning fast but only useful for full lookups, which means that you can't use it for operations like >, <or LIKE.

不幸的是,这是一种不同的索引类型,只有内存后端支持。它闪电般快速,但仅对完整查找有用,这意味着您不能将其用于>,<或之类的操作LIKE

Since it only works for the memory backend, you probably won't use it very often. The main case I can think of right now is the one that you create a temporary table in the memory with a set of results from another select and perform a lot of other selects in this temporary table using hash indexes.

由于它仅适用于内存后端,因此您可能不会经常使用它。我现在能想到的主要情况是,您在内存中创建一个临时表,其中包含来自另一个选择的一组结果,并使用哈希索引在此临时表中执行许多其他选择。

If you have a big VARCHARfield, you can "emulate" the use of a hash index when using a B-Tree, by creating another column and saving a hash of the big value on it. Let's say you're storing a url in a field and the values are quite big. You could also create an integer field called url_hashand use a hash function like CRC32or any other hash function to hash the url when inserting it. And then, when you need to query for this value, you can do something like this:

如果您有一个大VARCHAR字段,您可以在使用 B 树时“模拟”哈希索引的使用,方法是创建另一列并在其上保存大值的哈希。假设您将一个 url 存储在一个字段中并且值非常大。您还可以创建一个名为的整数字段,url_hashCRC32在插入时使用哈希函数或任何其他哈希函数来对 url 进行哈希处理。然后,当您需要查询此值时,您可以执行以下操作:

SELECT url FROM url_table WHERE url_hash=CRC32("http://gnu.org");

The problem with the above example is that since the CRC32function generates a quite small hash, you'll end up with a lot of collisions in the hashed values. If you need exact values, you can fix this problem by doing the following:

上面示例的问题在于,由于该CRC32函数生成的散列非常小,因此最终会在散列值中产生大量冲突。如果您需要精确的值,您可以通过执行以下操作来解决此问题:

SELECT url FROM url_table 
WHERE url_hash=CRC32("http://gnu.org") AND url="http://gnu.org";

It's still worth to hash things even if the collision number is high cause you'll only perform the second comparison (the string one) against the repeated hashes.

即使碰撞次数很高,仍然值得对事物进行哈希处理,因为您只会对重复的哈希值执行第二次比较(字符串一)。

Unfortunately, using this technique, you still need to hit the table to compare the urlfield.

不幸的是,使用这种技术,您仍然需要点击表格来比较url字段。

Wrap up

包起来

Some facts that you may consider every time you want to talk about optimization:

每次要谈论优化时,您可能会考虑的一些事实:

  1. Integer comparison is way faster than string comparison. It can be illustrated with the example about the emulation of the hash index in InnoDB.

  2. Maybe, adding additional steps in a process makes it faster, not slower. It can be illustrated by the fact that you can optimize a SELECTby splitting it into two steps, making the first one store values in a newly created in-memory table, and then execute the heavier queries on this second table.

  1. 整数比较比字符串比较快得多。可以用模拟中的哈希索引的例子来说明InnoDB

  2. 也许,在流程中添加额外的步骤会使其更快,而不是更慢。这可以通过以下事实来说明:您可以SELECT通过将a拆分为两个步骤来优化 a ,使第一个步骤将值存储在新创建的内存表中,然后在第二个表上执行更重的查询。

MySQL has other indexes too, but I think the B+Tree one is the most used ever and the hash one is a good thing to know, but you can find the other ones in the MySQL documentation.

MySQL 也有其他索引,但我认为 B+Tree 是有史以来最常用的索引,而散列索引是一件好事,但您可以在MySQL 文档中找到其他索引。

I highly recommend you to read the "High Performance MySQL" book, the answer above was definitely based on its chapter about indexes.

我强烈建议您阅读“高性能 MySQL”一书,上面的答案绝对是基于其关于索引的章节。

回答by Joshua

Basically an index is a map of all your keys that is sorted in order. With a list in order, then instead of checking every key, it can do something like this:

基本上,索引是按顺序排序的所有键的映射。有了一个按顺序排列的列表,而不是检查每个键,它可以做这样的事情:

1: Go to middle of list - is higher or lower than what I'm looking for?

1:转到列表中间 - 比我要查找的要高还是低?

2: If higher, go to halfway point between middle and bottom, if lower, middle and top

2:如果较高,则转到中间和底部之间的中间点,如果较低,则中间和顶部

3: Is higher or lower? Jump to middle point again, etc.

3:是高还是低?再次跳到中间点,等等。

Using that logic, you can find an element in a sorted list in about 7 steps, instead of checking every item.

使用该逻辑,您可以在大约 7 个步骤中找到排序列表中的元素,而不是检查每个项目。

Obviously there are complexities, but that gives you the basic idea.

显然有复杂性,但这给了你基本的想法。

回答by Abe Miessler

Take a look at this link: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

看看这个链接:http: //dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

How they work is too broad of a subject to cover in one SO post.

它们的工作方式过于广泛,无法在一篇 SO 帖子中涵盖。

Hereis one of the best explanations of indexes I have seen. Unfortunately it is for SQL Server and not MySQL. I'm not sure how similar the two are...

是我见过的对索引最好的解释之一。不幸的是,它适用于 SQL Server 而不是 MySQL。不知道这两个有没有相似...

回答by shahirnana

Take at thisvideos for more details about Indexing

快来看看这个视频有关索引更多详情

Simple Indexing You can create a unique index on a table. A unique index means that two rows cannot have the same index value. Here is the syntax to create an Index on a table

简单索引 您可以在表上创建唯一索引。唯一索引意味着两行不能具有相同的索引值。这是在表上创建索引的语法

CREATE UNIQUE INDEX index_name
ON table_name ( column1, column2,...);

You can use one or more columns to create an index. For example, we can create an index on tutorials_tblusing tutorial_author.

您可以使用一列或多列来创建索引。例如,我们可以tutorials_tbl使用 tutorial_author创建一个索引。

CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author)

You can create a simple index on a table. Just omit UNIQUE keyword from the query to create simple index. Simple index allows duplicate values in a table.

您可以在表上创建一个简单的索引。只需从查询中省略 UNIQUE 关键字即可创建简单索引。简单索引允许表中存在重复值。

If you want to index the values in a column in descending order, you can add the reserved word DESC after the column name.

如果要按降序索引列中的值,可以在列名后添加保留字 DESC。

mysql> CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author DESC)

回答by WoodrowShigeru

I want to add my 2 cents. I am far from being a database expert, but I've recently read up a bit on this topic; enough for me to try and give an ELI5. So, here's may layman's explanation.

我想加上我的 2 美分。我远不是数据库专家,但我最近阅读了一些关于这个主题的内容;足以让我尝试提供 ELI5。所以,这里可能是外行的解释。



I understand it as such that an index is like a mini-mirror of your table, pretty much like an associative array. If you feed it with a matching key then you can just jump to that row in one "command".

我的理解是,索引就像是表的迷你镜像,非常像关联数组。如果您使用匹配的键输入它,那么您可以在一个“命令”中跳转到该行。

But if you didn't have that index / array, the query interpreter must use a for-loop to go through all rows and check for a match (the full-table scan).

但是,如果您没有该索引/数组,则查询解释器必须使用 for 循环遍历所有行并检查匹配项(全表扫描)。

Having an index has the "downside" of extra storage (for that mini-mirror), in exchange for the "upside" of looking up content faster.

拥有索引具有额外存储(对于那个迷你镜像)的“缺点”,以换取更快查找内容的“优点”。

Note that (in dependence of your db engine) creating primary, foreign or unique keys automatically sets up a respective index as well. That same principle is basically why and how those keys work.

请注意(取决于您的数据库引擎)创建主键、外键或唯一键也会自动设置相应的索引。同样的原则基本上就是这些键为什么以及如何工作。

回答by Anush

Adding some visual representation to the list of answers. enter image description here

在答案列表中添加一些视觉表示。 在此处输入图片说明

MySQL uses an extra layer of indirection: secondary index records point to primary index records, and the primary index itself holds the on-disk row locations. If a row offset changes, only the primary index needs to be updated.

MySQL 使用额外的间接层:二级索引记录指向主索引记录,主索引本身保存磁盘上的行位置。如果行偏移量发生变化,则只需要更新主索引。

Caveat: Disk data structure looks flat in the diagram but actually is a B+ tree.

警告:磁盘数据结构在图中看起来很平坦,但实际上是一棵 B+ 树。

Source: link

来源:链接

回答by sendon1982

In MySQL InnoDB, there are two types of index.

在 MySQL InnoDB 中,有两种类型的索引。

  1. Primary key which is called clustered index. Index key words are stored with real record data in the B+Tree leaf node.

  2. Secondary key which is non clustered index. These index only store primary key's key words along with their own index key words in the B+Tree leaf node. So when searching from secondary index, it will first find its primary key index key words and scan the primary key B+Tree to find the real data records. This will make secondary index slower compared to primary index search. However, if the selectcolumns are all in the secondary index, then no need to look up primary index B+Tree again. This is called covering index.

  1. 主键称为聚集索引。索引关键字与真实记录数据一起存储在B+Tree叶子节点中。

  2. 非聚集索引的辅助键。这些索引只将主键的关键字连同它们自己的索引关键字一起存储在 B+Tree 叶节点中。所以从二级索引搜索时,会先找到它的主键索引关键词,然后扫描主键B+Tree,找到真正的数据记录。与主索引搜索相比,这将使二级索引更慢。但是,如果select列都在二级索引中,则无需再次查找主索引B+Tree。这称为覆盖索引。