MySQL varchar 索引长度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15157227/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 16:43:18  来源:igfitidea点击:

MySQL varchar index length

mysqlindexingvarchar

提问by l0st3d

I have a table like this:

我有一张这样的表:

CREATE TABLE `products` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(512) NOT NULL,
  `description` text,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=38 DEFAULT CHARSET=utf8;

and one like this:

和一个这样的:

CREATE TABLE `product_variants` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `product_id` int(11) unsigned NOT NULL,
  `product_code` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `product_code` (`product_code`),
  KEY `product_variant_product_fk` (`product_id`),
  CONSTRAINT `product_variant_product_fk` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1037 DEFAULT CHARSET=utf8;

and an SQL statement like this

和这样的 SQL 语句

SELECT p.id AS id, p.name AS name, p.description AS description, pv.id AS product_variant_id, pv.product_code AS product_code
FROM products p
INNER JOIN product_variants pv ON pv.product_id = p.id
ORDER BY p.name ASC
LIMIT 300 OFFSET 0;

which if I explain gives me this:

如果我解释给我这个:

+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
| id | select_type | table | type | possible_keys              | key                        | key_len | ref     | rows   | Extra          |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
|  1 | SIMPLE      | p     | ALL  | PRIMARY                    | NULL                       | NULL    | NULL    | 993658 | Using filesort |
|  1 | SIMPLE      | pv    | ref  | product_variant_product_fk | product_variant_product_fk | 4       | db.p.id |      1 |                |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
2 rows in set (0.00 sec)

For a million rows, this is pretty slow. I've tried adding an index on products.name with:

对于一百万行,这非常慢。我试过在 products.name 上添加一个索引:

ALTER TABLE products ADD INDEX `product_name_idx` (name(512));

which gives this:

这给出了:

mysql> show indexes from products;
+----------+------------+------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name         | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| products |          0 | PRIMARY          |            1 | id              | A         |      993658 |     NULL | NULL   |      | BTREE      |         |               |
| products |          1 | product_manf_fk  |            1 | manufacturer_id | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| products |          1 | product_name_idx |            1 | name            | A         |         201 |      255 | NULL   |      | BTREE      |         |               |
+----------+------------+------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

I think that the Sub_part column shows the prefix that has been in indexed (in bytes), as described on this page.

我认为 Sub_part 列显示已编入索引的前缀(以字节为单位),如本页所述

When I re-explain the query, I get:

当我重新解释查询时,我得到:

+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
| id | select_type | table | type | possible_keys              | key                        | key_len | ref     | rows   | Extra          |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
|  1 | SIMPLE      | p     | ALL  | PRIMARY                    | NULL                       | NULL    | NULL    | 993658 | Using filesort |
|  1 | SIMPLE      | pv    | ref  | product_variant_product_fk | product_variant_product_fk | 4       | db.p.id |      1 |                |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
2 rows in set (0.00 sec)

which looks like the new index is not being used. As described on this page, indexes will not be used for sorting if they are prefix indexes. In fact if I truncate the data with:

看起来新索引没有被使用。如 本页所述,如果索引是前缀索引,则不会使用索引进行排序。事实上,如果我截断数据:

alter table products modify `name`  varchar(255) not null;

The explain gives:

解释给出:

+----+-------------+-------+-------+----------------------------+----------------------------+---------+----------------------------------------------+------+-------+
| id | select_type | table | type  | possible_keys              | key                        | key_len | ref                                          | rows | Extra |
+----+-------------+-------+-------+----------------------------+----------------------------+---------+----------------------------------------------+------+-------+
|  1 | SIMPLE      | p     | index | PRIMARY                    | product_name_idx           | 767     | NULL                                         |  300 |       |
|  1 | SIMPLE      | pv    | ref   | product_variant_product_fk | product_variant_product_fk | 4       | oh_2c98c233_69fe_4f06_ad0d_fe6f85a5beac.p.id |    1 |       |
+----+-------------+-------+-------+----------------------------+----------------------------+---------+----------------------------------------------+------+-------+

which I think backs that up. However, it says on this pagethat InnoDB tables can have up to 767 bytes of index. If the length is in bytes, why does it refuse to have more than 255? If it's in characters, how is it deciding the length of each UTF-8 character? Is it just assuming 3?

我认为支持这一点。但是,它在此页面上说 InnoDB 表最多可以有 767 字节的索引。如果长度以字节为单位,为什么拒绝超过255?如果是字符,它是如何决定每个 UTF-8 字符的长度的?它只是假设3吗?

Also, am using this version of MySQL:

另外,我正在使用这个版本的 MySQL:

mysql> select version();
+------------+
| version()  |
+------------+
| 5.5.27-log |
+------------+
1 row in set (0.00 sec)

回答by Buttle Butkus

I must revise my answer due to my research. I originally posted this (quoting myself):

由于我的研究,我必须修改我的答案。我最初发布了这个(引用自己的话):

I believe the answer is that you cannot know how many characters will be in the index because you cannot know how many bytes your characters will be (unless you do something to exclude multi-byte characters).

我相信答案是您无法知道索引中有多少个字符,因为您无法知道您的字符有多少个字节(除非您执行某些操作来排除多字节字符)。

And I'm not sure, but it might still be correct, but not in quite the way I was thinking.

我不确定,但它可能仍然是正确的,但不是我想的那样。

Here is the correct answer:

以下是正确答案:

MySQL assumes 3 bytes per utf8 character. 255 characters is the maximum index size you can specify per column, because 256x3=768, which breaks the 767 byte limit.

MySQL 假设每个 utf8 字符有 3 个字节。255 个字符是您可以为每列指定的最大索引大小,因为 256x3=768,这打破了 767 字节的限制。

If you don't specify index size, MySQL chooses the maximum size (i.e. 255 per column). A UNIQUE constraint cannot be put on a utf8 column whose length is greater than 255, because a unique index must contain the entire cell value. But a regular index can be used - it will just index the first 255 characters (or first 767 bytes?). And that is where there is still some mystery for me.

如果不指定索引大小,MySQL 选择最大大小(即每列 255)。UNIQUE 约束不能放在长度大于 255 的 utf8 列上,因为唯一索引必须包含整个单元格值。但是可以使用常规索引 - 它只会索引前 255 个字符(或前 767 个字节?)。这对我来说仍然是个谜。

The MySTERY: I can see why MySQL assumes 3 bytes per character, for safety, because otherwise the UNIQUE constraint could be broken. But the docs seem to suggest that the index is actually sized in bytes, not characters. So, suppose you put a 255char (765 byte) index on a varchar(256) column. If the characters you store are all ASCII, 1-byte characters, like A-Z, a-z, 0-9, then then you can fit the entire column into the 767 byte index. And it seems like that is what would actually happen.

MySTERY:我可以理解为什么 MySQL 为安全起见假设每个字符 3 个字节,否则 UNIQUE 约束可能会被破坏。但是文档似乎表明索引实际上以字节为单位,而不是字符。因此,假设您在 varchar(25 6) 列上放置了 25 5 个字符(765 字节)索引。如果您存储的字符都是 ASCII、1 字节字符,例如 AZ、az、0-9,那么您可以将整个列放入 767 字节索引中。看起来这就是实际发生的事情。

Below is some more information from my original answer about characters, bytes, etc.

以下是我的原始答案中有关字符、字节等的更多信息。



According to wikipedia, UTF-8 character can be 1,2, 3, or 4 bytes long. But, according to this mysql documentation, the maximium character size is 3 bytes, and so any column index index over 255 characters might hit that byte limit. But as I understand it, it might not. If most of your characters are in the ASCII range, then your average character size will be closer to 1 byte. If your average character size is, for example, 1.3 bytes (mostly 1 byte, but a significant number of 2-3 byte characters), then you could specify an index of 767/1.3

根据维基百科,UTF-8 字符的长度可以是 1,2、3 或 4 个字节。但是,根据此 mysql 文档,最大字符大小为 3 个字节,因此任何超过 255 个字符的列索引索引都可能达到该字节限制。但据我了解,可能不会。如果您的大部分字符都在 ASCII 范围内,那么您的平均字符大小将接近 1 个字节。例如,如果您的平均字符大小为 1.3 个字节(主要是 1 个字节,但也有大量 2-3 个字节的字符),那么您可以指定索引为 767/1.3

So, if you are storing mostly 1-byte characters, your actual character limit would be more like: 767 / 1.3 = 590. But it turns out that is not the way it works. 255 characters is the limit.

因此,如果您主要存储 1 字节字符,那么您的实际字符限制将更像是:767 / 1.3 = 590。但事实证明,这不是它的工作方式。255 个字符是限制。

As mentioned in this MySQL documentation,

本 MySQL 文档中所述

Prefix limits are measured in bytes, whereas the prefix length in CREATE INDEX statements is interpreted as number of characters for nonbinary data types (CHAR, VARCHAR, TEXT). Take this into account when specifying a prefix length for a column that uses a multi-byte character set.

前缀限制以字节为单位,而 CREATE INDEX 语句中的前缀长度被解释为非二进制数据类型(CHAR、VARCHAR、TEXT)的字符数。在为使用多字节字符集的列指定前缀长度时,请考虑这一点。

It seems that MySQL is advising people to do a calculation/guestimation like I just did in order to determine your key size for a varchar column. But in fact you cannotspecify an index larger than 255 for utf8 columns.

似乎 MySQL 建议人们像我刚才那样进行计算/推测,以确定 varchar 列的键大小。但实际上,您不能为 utf8 列指定大于 255 的索引。

Finally, if you refer back to my second link again, there is also this:

最后,如果你再次参考我的第二个链接,还有这个:

When the innodb_large_prefix configuration option is enabled, this length limit is raised to 3072 bytes, for InnoDB tables that use the DYNAMIC and COMPRESSED row formats.

当启用 innodb_large_prefix 配置选项时,对于使用 DYNAMIC 和 COMPRESSED 行格式的 InnoDB 表,此长度限制提高到 3072 字节。

So it seems like you can get much larger indexes if you want to, with a bit of tweaking. Just make sure the row formats are DYNAMIC or COMPRESSED. You can probably specify an index of 1023 or 1024 characters in that case.

因此,如果您愿意,似乎可以通过一些调整获得更大的索引。只要确保行格式是动态的或压缩的。在这种情况下,您可能可以指定 1023 或 1024 个字符的索引。



顺便说一句,事实证明您可以使用以下方法存储 4 字节字符 the utf8mb4 character setutf8mb4 字符集。utf8 字符集显然只存储"plane 0" characters“平面0”字符

EDIT:

编辑:

I just tried to create a composite index on a varchar(511) column with a tinyint(1) column and got the error message saying the max index size was 767 bytes. This makes me believe that MySQL assumes utf8 character set columns will contain 3 bytes per character (the maximum), and allows you to use 255 chars max. But perhaps that is only with composite indexes. I will update my answer as I find out more. But for now I'm leaving this as an edit.

我只是尝试在带有 tinyint(1) 列的 varchar(511) 列上创建复合索引,并收到错误消息,指出最大索引大小为 767 字节。这让我相信 MySQL 假设 utf8 字符集列将包含每个字符 3 个字节(最大值),并允许您最多使用 255 个字符。但也许这仅适用于复合索引。当我发现更多信息时,我会更新我的答案。但现在我把它留作编辑。

回答by Rads

Limits on InnoDB Tables

InnoDB 表的限制

Warning

警告

Do not convert MySQL system tables in the mysql database from MyISAM to InnoDB tables. This is an unsupported operation. If you do this, MySQL does not restart until you restore the old system tables from a backup or re-generate them with the mysql_install_db program.

不要将 mysql 数据库中的 MySQL 系统表从 MyISAM 转换为 InnoDB 表。这是不受支持的操作。如果这样做,MySQL 不会重新启动,直到您从备份中恢复旧系统表或使用 mysql_install_db 程序重新生成它们。

Warning

警告

It is not a good idea to configure InnoDB to use data files or log files on NFS volumes. Otherwise, the files might be locked by other processes and become unavailable for use by MySQL.

将 InnoDB 配置为使用 NFS 卷上的数据文件或日志文件不是一个好主意。否则,这些文件可能会被其他进程锁定并无法供 MySQL 使用。

Maximums and Minimums

最大值和最小值

  1. A table can contain a maximum of 1000 columns.
  2. A table can contain a maximum of 64 secondary indexes.
  3. By default, an index key for a single-column index can be up to 767 bytes. The same length limit applies to any index key prefix. For example, you might hit this limit with a column prefix index of more than 255 characters on a TEXT or VARCHAR column, assuming a UTF-8 character set and the maximum of 3 bytes for each character. When the innodb_large_prefix configuration option is enabled, this length limit is raised to 3072 bytes, for InnoDB tables that use the DYNAMIC and COMPRESSED row formats.
  4. If you specify an index prefix length that is greater than the allowed maximum value, the length is silently reduced to the maximum length. In MySQL 5.6 and later, specifying an index prefix length greater than the maximum length produces an error.
  1. 一个表最多可以包含 1000 列。
  2. 一个表最多可以包含 64 个二级索引。
  3. 默认情况下,单列索引的索引键最多可达 767 字节。相同的长度限制适用于任何索引键前缀。例如,假设使用 UTF-8 字符集且每个字符最多 3 个字节,您可能会在 TEXT 或 VARCHAR 列上的列前缀索引超过 255 个字符时达到此限制。当启用 innodb_large_prefix 配置选项时,对于使用 DYNAMIC 和 COMPRESSED 行格式的 InnoDB 表,此长度限制提高到 3072 字节。
  4. 如果您指定的索引前缀长度大于允许的最大值,则长度会自动减少到最大长度。在 MySQL 5.6 及更高版本中,指定大于最大长度的索引前缀长度会产生错误。

When innodb_large_prefix is enabled, attempting to create an index prefix with a key length greater than 3072 for a REDUNDANT or COMPACT table causes an ER_INDEX_COLUMN_TOO_LONG error.

启用 innodb_large_prefix 时,尝试为 REDUNDANT 或 COMPACT 表创建键长度大于 3072 的索引前缀会导致 ER_INDEX_COLUMN_TOO_LONG 错误。

The InnoDB internal maximum key length is 3500 bytes, but MySQL itself restricts this to 3072 bytes. This limit applies to the length of the combined index key in a multi-column index.

InnoDB 内部最大密钥长度为 3500 字节,但 MySQL 本身将其限制为 3072 字节。此限制适用于多列索引中组合索引键的长度。

The maximum row length, except for variable-length columns (VARBINARY, VARCHAR, BLOB and TEXT), is slightly less than half of a database page. That is, the maximum row length is about 8000 bytes. LONGBLOB and LONGTEXT columns must be less than 4GB, and the total row length, including BLOB and TEXT columns, must be less than 4GB.

除了可变长度列(VARBINARY、VARCHAR、BLOB 和 TEXT)之外,最大行长度略小于数据库页面的一半。也就是说,最大行长度约为 8000 字节。LONGBLOB 和 LONGTEXT 列必须小于 4GB,总行长(包括 BLOB 和 TEXT 列)必须小于 4GB。

Reference: InnoDB Restrictions

参考: InnoDB 限制