MySQL 索引布尔字段是否有任何性能提升?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10524651/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there any performance gain in indexing a boolean field?
提问by Niet the Dark Absol
I'm just about to write a query that includes a WHERE isok=1
. As the name implies, isok
is a boolean field (actually a TINYINT(1) UNSIGNED
that is set to 0 or 1 as needed).
我即将编写一个包含WHERE isok=1
. 顾名思义,isok
是一个布尔字段(实际上TINYINT(1) UNSIGNED
是根据需要设置为0或1的a)。
Is there any performance gain in indexing this field? Would the engine (InnoDB in this case) perform better or worse looking up the index?
索引该字段是否有任何性能提升?引擎(在本例中为 InnoDB)在查找索引时会表现得更好还是更差?
采纳答案by Michael Koper
Not really. You should think about it like a book. If there were only 3 kinds of words in a book and you index all of them, you would have the same number of index pages as normal pages.
并不真地。你应该把它当作一本书来思考。如果一本书中只有 3 种单词,并且您将所有单词编入索引,则索引页数将与普通页数相同。
There would be a performance gain if there are relatively few records of one value. For example, if you have 1000 records and 10 of them are TRUE, then it would be useful if you searching with isok = 1
如果一个值的记录相对较少,则会提高性能。例如,如果您有 1000 条记录,其中 10 条为 TRUE,那么如果您使用isok = 1
As Michael Durrant mentioned, it also makes writes slower.
正如迈克尔·杜兰特 (Michael Durrant) 所提到的,它也会使写入速度变慢。
EDIT: Possible duplication: Indexing boolean fields
编辑:可能的重复:索引布尔字段
Here it explains that even if you have an index, if you have too many records it doesn't use the index anyways. MySQL not using index when checking = 1 , but using it with = 0
这里解释了即使您有索引,如果您有太多记录,它也不会使用索引。 MySQL 在检查 = 1 时不使用索引,而是将它与 = 0 一起使用
回答by oucil
Just to put a finer point on several other answers here, since in my experience, those looking at questions like this are in the same boat we were, we've all heard that indexing Boolean fields is pointless, and yet...
只是在这里对其他几个答案进行更详细的说明,因为根据我的经验,那些查看此类问题的人与我们在同一条船上,我们都听说索引布尔字段毫无意义,但是......
We have a table with about 4 million rows, only about 1000 or so at a time will have a Boolean switch flagged and that's what we search against. Adding an index on our Boolean field sped up queries by orders of magnitude, it went from about 9+ seconds to a fraction of a second.
我们有一个大约有 400 万行的表,一次只有大约 1000 行将标记一个布尔开关,这就是我们搜索的对象。在我们的布尔字段上添加索引将查询速度提高了几个数量级,从大约 9+ 秒到几分之一秒。
回答by ypercube??
It depends on the actual queries and the selectivity of the index/query combination.
这取决于实际查询和索引/查询组合的选择性。
Case A: condition WHERE isok = 1
and nothing else there:
案例 A:条件WHERE isok = 1
和其他任何东西:
SELECT *
FROM tableX
WHERE isok = 1
If the index is selective enough (say you have 1M rows and only 1k have
isok = 1
), then the SQL engine will probably use the indexand be faster than without it.If the index is not selective enough (say you have 1M rows and more than 100k have
isok = 1
), then the SQL engine will probably not use the indexand do a table scan.
如果索引足够有选择性(假设您有 1M 行而只有 1k 行
isok = 1
),那么 SQL 引擎可能会使用索引并且比没有它更快。如果索引的选择性不够(假设您有 1M 行并且超过 100k 有
isok = 1
),那么 SQL 引擎可能不会使用该索引并进行表扫描。
Case B: condition WHERE isok = 1
and more stuff:
案例 B:条件WHERE isok = 1
和更多东西:
SELECT *
FROM tableX
WHERE isok = 1
AND another_column = 17
Then, it depends on what other indexes you have. An index on another_column
would probably be more selective than the index on isok
which has only two possible values. An index on (another_column, isok)
or (isok, another_column)
would be even better.
然后,这取决于您拥有的其他索引。索引another_column
可能比isok
只有两个可能值的索引更具选择性。(another_column, isok)
或(isok, another_column)
将更好的索引。
回答by Jinlye
It depends on the distribution of the data.
这取决于数据的分布。
Imagine I had a book with 1000 closely typed pages, and the only words in my book were 'yes' and 'no' repeated over and over and distributed randomly. If I was asked to circle all the instances of 'yes', would an index in the back of the book help? It depends.
想象一下,我有一本书有 1000 页紧密打字的页面,我书中唯一的单词是一遍又一遍地重复并随机分布的“是”和“否”。如果我被要求圈出所有“是”的例子,书后的索引会有帮助吗?这取决于。
If there was a half-and-half random distribution of yes's and no's, then looking up in the index wouldn't help. The index would make the book a lot bigger, and anyway I'd be quicker just to start from the front and work my way through each page looking for all the instances of 'yes' and circling them, rather than looking up each item in the index and then taking the reference from the index entry to the page that it refers to.
如果是和否的一半和一半随机分布,那么在索引中查找将无济于事。索引会使这本书更大,无论如何我会更快地从前面开始并在每一页中寻找所有“是”的实例并圈出它们,而不是在其中查找每个项目索引,然后将索引条目中的引用引用到它所引用的页面。
But if there were, say, just ten instances of 'yes' in my thousand-page book and everything else was just millions of no's, then an index would save me loads of time in finding those ten instances of 'yes' and circling them.
但是,如果在我的一千页书中只有十个“是”的实例,而其他一切都只是数百万个“否”,那么索引将为我节省大量时间来查找这十个“是”实例并圈出它们.
It's the same in databases. If it's a 50:50 distribution, then an index isn't going to help - the database engine is better off just ploughing through the data from start to finish (full table scan), and the index would just make the database bigger, and slower to write and update. But if it is something like a 4000:1 distribution (as per oucilin this thread), then an index seek can speed it up hugely, if it is the 1 in 4000 items that you are looking for.
在数据库中也是如此。如果是 50:50 分布,那么索引将无济于事 - 数据库引擎最好从头到尾遍历数据(全表扫描),而索引只会使数据库更大,并且写入和更新较慢。但是,如果它是类似于 4000:1 的分布(根据本线程中的oucil),那么索引查找可以大大加快它的速度,如果它是您要查找的 4000 个项目中的 1 个。
回答by Michael Durrant
No, usually not.
不,通常不会。
You usually index fields for searching when they have high selectivity/cardinality. A boolean field's cardinality is very low in most tables. It would also make your writes fractionally slower.
当字段具有高选择性/基数时,您通常会索引字段以进行搜索。在大多数表中,布尔字段的基数非常低。它也会使您的写入速度稍慢。
回答by Maksym Polshcha
Actually this depends on queries you run. But, generally yes, as well as indexing a field of any other type.
实际上这取决于您运行的查询。但是,通常是的,以及索引任何其他类型的字段。
回答by ilanco
Yes an index will improve performance, check the output of EXPLAIN with and without the index.
是的,索引会提高性能,检查带有和不带有索引的 EXPLAIN 的输出。
From the docs:
从文档:
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data.
索引用于快速查找具有特定列值的行。如果没有索引,MySQL 必须从第一行开始,然后通读整个表以查找相关行。桌子越大,这个成本就越高。如果该表具有相关列的索引,MySQL 可以快速确定要在数据文件中间查找的位置,而无需查看所有数据。
I think it's also safe to say an index will not DECREASEperformance in this case, so you have only to gain from it.
我认为在这种情况下可以肯定地说索引不会降低性能,因此您只需从中受益。