MySQL“文本”字段中适合多少 UTF-8 文本?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4420164/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How much UTF-8 text fits in a MySQL "Text" field?
提问by Xeoncross
According to MySQL, a text
column holds 65,535 bytes.
根据 MySQL,一text
列包含 65,535 个字节。
So if this a legitimate boundary then will it actually only fit about 32k UTF-8 characters, right? Or is this one of those "fuzzy" boundaries where the guys that wrote the docs can't tell characters from bytes and it will actually allow ~64k UTF-8 characters if set to something like utf8_general_ci
?
所以如果这是一个合法的边界,那么它实际上只适合大约 32k UTF-8 字符,对吗?或者这是编写文档的人无法从字节中分辨字符的那些“模糊”边界之一,如果设置为类似的东西,它实际上将允许 ~64k UTF-8 字符utf8_general_ci
?
回答by Wolph
A text
column can be up to 65,535
bytes.
一text
列最多可达65,535
字节。
An utf-8
character can be up to 3 bytes.
一个utf-8
字符最多可以有 3 个字节。
So... your actual limit can be 21,844
characters.
所以......你的实际限制可以是21,844
字符。
See the manual for more info: http://dev.mysql.com/doc/refman/5.1/en/string-type-overview.html
有关更多信息,请参阅手册:http: //dev.mysql.com/doc/refman/5.1/en/string-type-overview.html
A variable-length string. M represents the maximum column length in characters. The range of M is 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used. For example, utf8 characters can require up to three bytes per character, so a VARCHAR column that uses the utf8 character set can be declared to be a maximum of 21,844 characters.
一个可变长度的字符串。M 表示以字符为单位的最大列长度。M 的范围是 0 到 65,535。VARCHAR 的有效最大长度取决于最大行大小(65,535 字节,在所有列之间共享)和所使用的字符集。例如,utf8 字符可能需要每个字符最多三个字节,因此可以将使用 utf8 字符集的 VARCHAR 列声明为最多 21,844 个字符。
回答by Warren Young
UTF-8 characters can take up to 4 bytes each, not 2 as you are supposing. UTF-8 is a variable-width encoding, depending on the number of significant bits in the Unicode code point:
UTF-8 字符每个最多可占用 4 个字节,而不是您假设的 2 个字节。UTF-8 是一种可变宽度编码,取决于 Unicode 代码点中有效位的数量:
- 7 bits and under in the Unicode code point: 1 byte in UTF-8
- 8 to 11 bits: 2 bytes in UTF-8
- 12 to 16 bits: 3 bytes
- 17 to 21 bits: 4 bytes
- Unicode 代码点中的 7 位及以下:UTF-8 中的 1 个字节
- 8 到 11 位:UTF-8 中的 2 个字节
- 12 到 16 位:3 个字节
- 17 到 21 位:4 个字节
The original UTF-8 specallows encoding up to 31-bit Unicode values, taking as many as 6 bytes to encode in UTF-8 form. After UTF-8 became popular, the Unicode Consortium declared that they will never use code points beyond 221 - 1. This is now standardized as RFC 3629.
的原始UTF-8规范允许编码多达31位的Unicode值,以多达6个字节来编码UTF-8的形式。在 UTF-8 流行之后,Unicode 联盟宣布他们永远不会使用超过 2 21 - 1 的代码点。这现在被标准化为RFC 3629。
MySQL currently (i.e. version 5.6) only supports the Unicode Basic Multilingual Planecharacters, for which UTF-8 needs up to 3 bytes per character. That means the current answer to your question is that your TEXT
field can hold at least 21,844 characters.
MySQL目前(即 5.6 版)仅支持 Unicode Basic Multilingual Plane字符,其中 UTF-8 每个字符最多需要 3 个字节。这意味着您的问题的当前答案是您的TEXT
字段至少可以包含 21,844 个字符。
Depending on how you look at it, the actual limits are higher or lower than that:
根据您如何看待它,实际限制高于或低于此值:
If you assume, as I do, that the BMP limitation will eventually be lifted in MySQL or one of itsforks, you shouldn't count on being able to store more than 16,383 characters in that field if your MySQL client allows arbitrary Unicode text input.
On the other hand, you may be able to exploit the fact that UTF-8 is a variable width encoding. If you know your text is mostly plain English with just the occasional non-ASCII character, your effective in-practice limit could approach the maximum 64 KB - 1 character limit.
回答by Danubian Sailor
However, when used as primary key, MySQL assumes that each limit of column's size adds 3 bytesto key.
但是,当用作主键时,MySQL 假定列大小的每个限制向键添加3 个字节。
mysql> alter table test2 modify code varchar(333) character set utf8;
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table test2 modify code varchar(334) character set utf8;
ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes
Well, using long string columns as primary key is generally a bed practice, however I've came across that problem when working with database of one commercial (!) product.
嗯,使用长字符串列作为主键通常是一种习惯,但是我在使用一个商业 (!) 产品的数据库时遇到了这个问题。