MySQL VARCHAR 长度和 UTF-8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1997540/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MySQL VARCHAR Lengths and UTF-8
提问by Alix Axel
In MySQL, if I create a new VARCHAR(32)
field in a UTF-8 table does it means I can store 32 bytes of data in that field or 32 chars (multi-byte)?
在 MySQL 中,如果我VARCHAR(32)
在 UTF-8 表中创建一个新字段,是否意味着我可以在该字段中存储 32 个字节的数据或 32 个字符(多字节)?
回答by M Brown
This answer showed up at the top of my google search results but wasn't correct so:
这个答案出现在我的谷歌搜索结果的顶部,但不正确,所以:
The confusion is probably due to different versions of mysql being tested.
混淆可能是由于正在测试的 mysql 版本不同。
- Version 4 counts bytes
- Version 5 counts characters
- 版本 4 计算字节数
- 版本 5 计数字符
http://dev.mysql.com/doc/refman/5.0/en/string-type-overview.html
http://dev.mysql.com/doc/refman/5.0/en/string-type-overview.html
MySQL interprets length specifications in character column definitions in character units. (Before MySQL 4.1, column lengths were interpreted in bytes.) This applies to CHAR, VARCHAR, and the TEXT types.
MySQL 以字符为单位解释字符列定义中的长度规范。(在 MySQL 4.1 之前,列长度以字节为单位进行解释。)这适用于 CHAR、VARCHAR 和 TEXT 类型。
Interestingly (I hadn't thought about it) the max length of a varchar column is affected by utf8 as follows:
有趣的是(我没有考虑过)varchar 列的最大长度受 utf8 的影响如下:
The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used. For example, utf8 characters can require up to three bytes per character, so a VARCHAR column that uses the utf8 character set can be declared to be a maximum of 21,844 characters.
MySQL 5.0.3 及更高版本中 VARCHAR 的有效最大长度受最大行大小(65,535 字节,在所有列之间共享)和使用的字符集的约束。例如,utf8 字符可能需要每个字符最多三个字节,因此可以将使用 utf8 字符集的 VARCHAR 列声明为最多 21,844 个字符。
回答by jspcal
it would let you store 32 multi-byte chars
它可以让你存储 32 个多字节字符
To save space with UTF-8, use VARCHAR instead of CHAR. Otherwise, MySQL must reserve three bytes for each character in a CHAR CHARACTER SET utf8 column because that is the maximum possible length. For example, MySQL must reserve 30 bytes for a CHAR(10) CHARACTER SET utf8 column.
要使用 UTF-8 节省空间,请使用 VARCHAR 而不是 CHAR。否则,MySQL 必须为 CHAR CHARACTER SET utf8 列中的每个字符保留三个字节,因为这是可能的最大长度。例如,MySQL 必须为 CHAR(10) CHARACTER SET utf8 列保留 30 个字节。
回答by YOU
32 multibytesdata for varchar(32)
with collation utf8_unicode_ci
, I just tested with XAMPP.
带有整理的32多字节数据,我刚刚用 XAMPP 进行了测试。varchar(32)
utf8_unicode_ci
1234567890123456789012345678901234567890
Get truncated to:
截断为:
12345678901234567890123456789012
Keep in mind that these are not regular ASCII chars.
请记住,这些不是常规的 ASCII 字符。
回答by Nudge
It is better to use "char" for high-frequent update tables because the total data length of the row will be fixed and fast. Varchar columns make row data sizes dynamic. That's not good for MyISAM, but I don't know about InnoDB and others. For example, if you have a very narrow "type" column, it may be better to use char(2) with latin1 charset to only claim minimal space.
高频更新表最好使用“char”,因为行的总数据长度将是固定且快速的。Varchar 列使行数据大小动态化。这对 MyISAM 不利,但我不了解 InnoDB 和其他。例如,如果您有一个非常窄的“类型”列,最好将 char(2) 与 latin1 字符集一起使用以仅要求最小的空间。
回答by Laurent Lyaudet
If you connect to the database using latin1 encoding (for example with PHP) to save an PHP UTF8 string in an MySQL UTF8 column, you will have a double UTF8 encoding.
如果您使用 latin1 编码(例如使用 PHP)连接到数据库以将 PHP UTF8 字符串保存在 MySQL UTF8 列中,您将拥有双 UTF8 编码。
If the UTF8 string $s
is 32 characters long but 64 bytes long and the column is VARCHAR(32)
UTF8, the double encoding will convert the string $s
to a 64 characters long UTF8 string that will be truncated in the database to its 32 first characters corresponding to the 32 first bytes of $s
. You may end up thinking that MySQL 5 behaves like MySQL 4 but it is in fact a second cause for the same effect.
如果 UTF8 字符串$s
是 32 个字符长但 64 个字节长且列是VARCHAR(32)
UTF8,则双重编码会将字符串转换$s
为一个 64 个字符长的 UTF8 字符串,该字符串将在数据库中被截断为其前 32 个字节对应的前 32 个字符的$s
。您可能最终会认为 MySQL 5 的行为与 MySQL 4 相似,但实际上它是导致相同效果的第二个原因。