MySQL 索引 - 最佳实践是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3049283/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 16:20:44  来源:igfitidea点击:

MySQL indexes - what are the best practices?

mysqlindexingquery-optimization

提问by Haroldo

I've been using indexes on my MySQL databases for a while now but never properly learntabout them. Generally I put an index on any fields that I will be searching or selecting using a WHEREclause but sometimes it doesn't seem so black and white.

我一直在使用我的MySQL数据库的索引,而现在却从不正常了解到他们。通常,我将索引放在我将使用WHERE子句搜索或选择的任何字段上,但有时它看起来并不那么黑白分明。

What are the best practices for MySQL indexes?

MySQL 索引的最佳实践是什么?

Example situations/dilemmas:

示例情况/困境:

  • If a table has six columns and all of them are searchable, should I index all of them or none of them?

  • What are the negative performance impacts of indexing?

  • If I have a VARCHAR 2500 column which is searchable from parts of my site, should I index it?

  • 如果一个表有六列并且所有列都可以搜索,我应该索引所有列还是不索引它们?

  • 索引对性能有哪些负面影响?

  • 如果我有一个 VARCHAR 2500 列可以从我网站的某些部分进行搜索,我应该将它编入索引吗?

回答by timdev

You should definitely spend some time reading up on indexing, there's a lot written about it, and it's important to understand what's going on.

你绝对应该花一些时间阅读索引,有很多关于它的文章,了解正在发生的事情很重要。

Broadly speaking, an index imposes an ordering on the rows of a table.

从广义上讲,索引对表的行进行排序。

For simplicity's sake, imagine a table is just a big CSV file. Whenever a row is inserted, it's inserted at the end. So the "natural" ordering of the table is just the order in which rows were inserted.

为简单起见,假设一个表只是一个大的 CSV 文件。每当插入一行时,它都会插入到末尾。所以表的“自然”顺序就是插入行的顺序。

Imagine you've got that CSV file loaded up in a very rudimentary spreadsheet application. All this spreadsheet does is display the data, and numbers the rows in sequential order.

想象一下,您已经在一个非常基本的电子表格应用程序中加载了该 CSV 文件。这个电子表格所做的就是显示数据,并按顺序对行进行编号。

Now imagine that you need to find all the rows that have some value "M" in the third column. Given what you have available, you have only one option. You scan the table checking the value of the third column for each row. If you've got a lot of rows, this method (a "table scan") can take a long time!

现在假设您需要在第三列中找到具有某个值“M”的所有行。鉴于您拥有的可用资源,您只有一种选择。您扫描表格,检查每一行的第三列的值。如果您有很多行,则此方法(“表扫描”)可能需要很长时间!

Now imagine that in addition to this table, you've got an index. This particular index is the index of values in the third column. The index lists all of the values from the third column, in some meaningful order (say, alphabetically) and for each of them, provides a list of row numbers where that value appears.

现在想象一下,除了这个表,你还有一个索引。此特定索引是第三列中值的索引。索引以某种有意义的顺序(例如,按字母顺序)列出了第三列中的所有值,并为每个值提供了出现该值的行号列表。

Now you have a good strategy for finding all the rows where the value of the third column is "M". For instance, you can perform a binary search! Whereas the table scan requires you to look N rows (where N is the number of rows), the binary search only requires that you look at log-n index entries, in the very worst case. Wow, that's sure a lot easier!

现在您有一个很好的策略来查找第三列的值为“M”的所有行。例如,您可以执行二分搜索!表扫描需要您查看 N 行(其中 N 是行数),而在最坏的情况下,二分查找只需要您查看 log-n 索引条目。哇,那肯定容易多了!

Of course, if you have this index, and you're adding rows to the table (at the end, since that's how our conceptual table works), you need to update the index each and every time. So you do a little more work while you're writing new rows, but you save a ton of time when you're searching for something.

当然,如果您有这个索引,并且您正在向表中添加行(最后,因为这就是我们的概念表的工作方式),您需要每次都更新索引。因此,您在编写新行时会做更多的工作,但在搜索某些内容时可以节省大量时间。

So, in general, indexing creates a tradeoff between read efficiency and write efficiency. With no indexes, inserts can be very fast -- the database engine just adds a row to the table. As you add indexes, the engine must update each index while performing the insert.

因此,一般而言,索引会在读取效率和写入效率之间进行权衡。由于没有索引,插入会非常快——数据库引擎只是向表中添加一行。添加索引时,引擎必须在执行插入时更新每个索引。

On the other hand, reads become a lot faster.

另一方面,读取变得更快。

Hopefully that covers your first two questions (as others have answered -- you need to find the right balance).

希望这涵盖了您的前两个问题(正如其他人所回答的那样——您需要找到正确的平衡点)。

Your third scenario is a little more complicated. If you're using LIKE, indexing engines will typically help with your read speed up to the first "%". In other words, if you're SELECTing WHERE column LIKE 'foo%bar%', the database will use the index to find all the rows where column starts with "foo", and then need to scan that intermediate rowset to find the subset that contains "bar". SELECT ... WHERE column LIKE '%bar%' can't use the index. I hope you can see why.

你的第三个场景有点复杂。如果您使用 LIKE,索引引擎通常会帮助您将读取速度提高到第一个“%”。换句话说,如果您选择 WHERE 列 LIKE 'foo%bar%',数据库将使用索引查找列以“foo”开头的所有行,然后需要扫描该中间行集以查找子集包含“酒吧”。SELECT ... WHERE column LIKE '%bar%' 不能使用索引。我希望你能明白为什么。

Finally, you need to start thinking about indexes on more than one column. The concept is the same, and behaves similarly to the LIKE stuff -- essentially, if you have an index on (a,b,c), the engine will continue using the index from left to right as best it can. So a search on column a might use the (a,b,c) index, as would one on (a,b). However, the engine would need to do a full table scan if you were searching WHERE b=5 AND c=1)

最后,您需要开始考虑多列上的索引。概念是相同的,并且行为类似于 LIKE 的东西——本质上,如果你在 (a,b,c) 上有一个索引,引擎将继续尽可能地从左到右使用索引。因此,对 a 列的搜索可能会使用 (a,b,c) 索引,就像 (a,b) 上的搜索一样。但是,如果您搜索 WHERE b=5 AND c=1),引擎将需要进行全表扫描

Hopefully this helps shed a little light, but I must reiterate that you're best off spending a few hours digging around for good articles that explain these things in depth. It's also a good idea to read your particular database server's documentation. The way indices are implemented and used by query planners can vary pretty widely.

希望这有助于阐明一点,但我必须重申,您最好花几个小时来寻找深入解释这些事情的好文章。阅读特定数据库服务器的文档也是一个好主意。查询规划器实现和使用索引的方式可以有很大的不同。

回答by Bill Karwin

Check out presentations like More Mastering the Art of Indexing.

查看更多掌握索引艺术之类的演示文稿。

Update 12/2012: I have posted a new presentation of mine: How to Design Indexes, Really. I presented this in October 2012 at ZendCon in Santa Clara, and in December 2012 at Percona Live London.

2012 年 12 月更新:我发布了我的新演示文稿:如何设计索引,真的。我于 2012 年 10 月在圣克拉拉的 ZendCon 和 2012 年 12 月的 Percona Live London 上介绍了这个。

Designing the best indexes is a process that has to match the queries you run in your app.

设计最佳索引是一个必须匹配您在应用程序中运行的查询的过程。

It's hard to recommend any general-purpose rules about which columns are best to index, or whether you should index all columns, no columns, which indexes should span multiple columns, etc. It depends on the queries you need to run.

很难推荐任何关于哪些列最适合索引的通用规则,或者是否应该索引所有列,没有列,哪些索引应该跨越多个列等等。这取决于您需要运行的查询。

Yes, there is some overhead so you shouldn't create indexes needlessly. But you shouldcreate the indexes that give benefit to the queries you need to run quickly. The overhead of an index is usually far outweighed by its benefit.

是的,有一些开销,所以你不应该不必要地创建索引。但是您应该创建有利于快速运行的查询的索引。索引的开销通常远远超过它的好处。

For a column that is VARCHAR(2500), you probably want to use a FULLTEXT indexor a prefix index:

对于 VARCHAR(2500) 列,您可能希望使用FULLTEXT 索引或前缀索引:

CREATE INDEX i ON SomeTable(longVarchar(100));

Note that a conventional index can't help if you're searching for words that may be in the middle of that long varchar. For that, use a fulltext index.

请注意,如果您正在搜索可能位于该长 varchar 中间的单词,则常规索引无济于事。为此,请使用全文索引。

回答by Eric J.

I won't repeat some of the good advice in other answers, but will add:

我不会在其他答案中重复一些好的建议,但会补充:

Compound Indices

复合指数

You can create compound indices - an index that includes multiple columns. MySQL can use these from leftto right. So if you have:

您可以创建复合索引 - 包含多个列的索引。MySQL能够从使用这些留下的权利。所以如果你有:

Table A
Id
Name
Category
Age
Description

if you have a compound index that includes Name/Category/Age in that order, these WHERE clauses would use the index:

如果您有一个按该顺序包含名称/类别/年龄的复合索引,这些 WHERE 子句将使用该索引:

WHERE Name='Eric' and Category='A'

WHERE Name='Eric' and Category='A' and Age > 18

but

WHERE Category='A' and Age > 18

would not use that index because everything has to be used from left to right.

不会使用该索引,因为所有内容都必须从左到右使用。

Explain

解释

Use Explain / Explain Extended to understand what indices are available to MySQL and which one it actually selects. MySQL will only use ONEkey per query.

使用 Explain / Explain Extended 来了解 MySQL 可用的索引以及它实际选择的索引。 MySQL每个查询只使用一个

EXPLAIN EXTENDED SELECT * from Table WHERE Something='ABC'

Slow Query Log

慢查询日志

Turn on the slow query logto see which queries are running slow.

打开慢查询日志以查看哪些查询运行缓慢。

Wide Columns

宽列

If you have a wide column where MOST of the distinction happens in the first several characters, you can use only the first N characters in your index. Example: We have a ReferenceNumber column defined as varchar(255) but 97% of the cases, the reference number is 10 characters or less. I changed the index to only look at the first 10 characters and improved performance quite a bit.

如果您有一个宽列,其中大部分区别发生在前几个字符中,则您只能使用索引中的前 N ​​个字符。示例:我们有一个定义为 varchar(255) 的 ReferenceNumber 列,但在 97% 的情况下,参考编号为 10 个字符或更少。我将索引更改为仅查看前 10 个字符并大大提高了性能。

回答by Pete

If a table has six columns and all of them are searchable, should i index all of them or none of them

如果一个表有六列并且所有列都可以搜索,我应该索引所有列还是不索引它们

Are you searching on a field by field basis or are some searches using multiple fields? Which fields are mostbeing searched on? What are the field types? (Index works better on INTs than on VARCHARs for example) Have you tried using EXPLAIN on the queries that are being run?

您是按字段搜索还是使用多个字段进行搜索?哪些领域最常被搜索?字段类型有哪些?(例如,索引在 INT 上比在 VARCHAR 上效果更好)您是否尝试过对正在运行的查询使用 EXPLAIN?

What are the negetive performance impacts of indexing

索引对性能的负面影响是什么

UPDATEs and INSERTs will be slower. There's also the extra storage space requirments, but that's usual unimportant these days.

UPDATE 和 INSERT 会更慢。还有额外的存储空间要求,但如今这通常并不重要。

If i have a VARCHAR 2500 column which is searchable from parts of my site, should i index it

如果我有一个 VARCHAR 2500 列可以从我网站的某些部分进行搜索,我是否应该将其编入索引

No, unless it's UNIQUE (which means it's already indexed) or you only search for exactmatches on that field (not using LIKE or mySQL's fulltext search).

不,除非它是唯一的(这意味着它已经被编入索引)或者您只在该字段上搜索精确匹配项(不使用 LIKE 或 mySQL 的全文搜索)。

Generally I put an index on any fields that i will be searching or selecting using a WHERE clause

通常,我会在我将使用 WHERE 子句搜索或选择的任何字段上放置索引

I'd normally index the fields that are the most queried, and then INTs/BOOLEANs/ENUMs rather that fields that are VARCHARS. Don't forget, often you need to create an index on combined fields, rather than an index on an individual field. Use EXPLAIN, and check the slow log.

我通常会索引查询最多的字段,然后是 INT/BOOLEAN/ENUM 而不是 VARCHARS 的字段。不要忘记,通常您需要在组合字段上创建索引,而不是在单个字段上创建索引。使用 EXPLAIN,并检查慢日志。

回答by Srikar Doddi

Load Data Efficiently: Indexes speed up retrievals but slow down inserts and deletes, as well as updates of values in indexed columns. That is, indexes slow down most operations that involve writing. This occurs because writing a row requires writing not only the data row, it requires changes to any indexes as well. The more indexes a table has, the more changes need to be made, and the greater the average performance degradation. Most tables receive many reads and few writes, but for a table with a high percentage of writes, the cost of index updating might be significant.

高效加载数据:索引加快检索速度,但减慢插入和删除,以及索引列中值的更新。也就是说,索引会减慢大多数涉及写入的操作。发生这种情况是因为写入一行不仅需要写入数据行,还需要更改任何索引。一个表的索引越多,需要做的改变就越多,平均性能下降的幅度就越大。大多数表接收很多读取和很少写入,但是对于具有高写入百分比的表,索引更新的成本可能很大。

Avoid Indexes: If you don't need a particular index to help queries perform better, don't create it.

避免索引:如果您不需要特定的索引来帮助查询性能更好,请不要创建它。

Disk Space: An index takes up disk space, and multiple indexes take up correspondingly more space. This might cause you to reach a table size limit more quickly than if there are no indexes. Avoid indexes wherever possible.

磁盘空间:一个索引占用磁盘空间,多个索引相应占用更多空间。与没有索引的情况相比,这可能会导致您更快地达到表大小限制。尽可能避免索引。

Takeaway: Don't over index

要点:不要过度索引

回答by Anax

In general, indices help speedup database search, having the disadvantage of using extra disk space and slowing INSERT/ UPDATE/ DELETEqueries. Use EXPLAINand read the results to find out when MySQL uses your indices.

在一般情况下,指数加速帮助数据库搜索,其使用额外的磁盘空间和减慢的缺点INSERT/ UPDATE/DELETE查询。使用EXPLAIN并阅读结果以了解 MySQL 何时使用您的索引。

If a table has six columns and all of them are searchable, should i index all of them or none of them?

如果一个表有六列并且所有列都可以搜索,我应该索引所有列还是不索引它们?

Indexing all six columns isn't always the best practice.

索引所有六列并不总是最佳实践。

(a) Are you going to use any of those columns when searching for specific information?

(a) 在搜索特定信息时,您会使用这些列中的任何一个吗?

(b) What is the selectivity of those columns (how many distinct values are there stored, in comparison to the total amount of records on the table)?

(b) 这些列的选择性是多少(与表上的记录总数相比,存储了多少不同的值)?

MySQL uses a cost-based optimizer, which tries to find the "cheapest" path when performing a query. And fields with low selectivity aren't good candidates.

MySQL 使用基于成本的优化器,它会在执行查询时尝试找到“最便宜”的路径。选择性低的领域不是很好的候选人。

What are the negetive performance impacts of indexing?

索引对性能有哪些负面影响?

Already answered: extra disk space, lower performance during insert - update - delete.

已经回答:额外的磁盘空间,插入 - 更新 - 删除期间性能较低。

If i have a VARCHAR 2500 column which is searchable from parts of my site, should i index it?

如果我有一个 VARCHAR 2500 列可以从我网站的某些部分进行搜索,我应该将它编入索引吗?

Try the FULLTEXT Index.

试试FULLTEXT 索引

回答by Paul Creasey

1/2) Indexes speed up certain select operations but they slow down other operations like insert, update and deletes. It can be a fine balance.

1/2) 索引会加速某些选择操作,但会减慢其他操作,如插入、更新和删除。它可以是一个很好的平衡。

3) use a full text index or perhaps sphinx

3) 使用全文索引或 sphinx