MySQL 中的 utf8mb4 和 utf8 字符集有什么区别?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30074492/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 20:58:49  来源:igfitidea点击:

What is the difference between utf8mb4 and utf8 charsets in MySQL?

mysqlencodingutf-8character-encodingutf8mb4

提问by Mojtaba Rezaeian

What is the difference between utf8mb4and utf8charsets in MySQL?

MySQL 中的字符集utf8mb4utf8字符集有什么区别?

I already know about ASCII, UTF-8, UTF-16and UTF-32encodings; but I'm curious to know whats the difference of utf8mb4group of encodings with other encoding types defined in MySQL Server.

我已经了解ASCIIUTF-8UTF-16UTF-32编码;但我很想知道utf8mb4编码组与MySQL Server 中定义的其他编码类型有什么区别。

Are there any special benefits/proposes of using utf8mb4rather than utf8?

使用utf8mb4而不是有什么特别的好处/建议utf8吗?

回答by CodeCaster

UTF-8is a variable-length encoding. In the case of UTF-8, this means that storing one code point requires one to four bytes. However, MySQL's encoding called "utf8" (alias of "utf8mb3") only stores a maximum of three bytes per code point.

UTF-8是一种可变长度编码。对于 UTF-8,这意味着存储一个代码点需要一到四个字节。但是,MySQL 的编码称为“utf8”(“utf8mb3”的别名)每个代码点最多只能存储三个字节。

So the character set "utf8"/"utf8mb3" cannot store all Unicode code points: it only supports the range 0x000 to 0xFFFF, which is called the "Basic Multilingual Plane". See also Comparison of Unicode encodings.

所以字符集“utf8”/“utf8mb3”不能存储所有的Unicode码位:它只支持0x000到0xFFFF这个范围,被称为“基本多语言平面”。另请参阅Unicode 编码比较

This is what (a previous version of the same page at) the MySQL documentationhas to say about it:

这就是MySQL 文档(同一页面的先前版本)对此的说明:

The character set named utf8[/utf8mb3] uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:

  • For a BMP character, utf8[/utf8mb3] and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.

  • For a supplementary character, utf8[/utf8mb3] cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8[/utf8mb3] cannot store the character at all, you do not have any supplementary characters in utf8[/utf8mb3] columns and you need not worry about converting characters or losing data when upgrading utf8[/utf8mb3] data from older versions of MySQL.

名为 utf8[/utf8mb3] 的字符集每个字符最多使用三个字节,并且仅包含 BMP 字符。从 MySQL 5.5.3 开始,utf8mb4 字符集每个字符最多使用四个字节,支持补充字符:

  • 对于 BMP 字符,utf8[/utf8mb3] 和 utf8mb4 具有相同的存储特性:相同的代码值、相同的编码、相同的长度。

  • 对于补充字符,utf8[/utf8mb3] 根本无法存储该字符,而 utf8mb4 需要四个字节来存储它。由于 utf8[/utf8mb3] 根本无法存储字符,因此您在 utf8[/utf8mb3] 列中没有任何补充字符,并且从旧版本的 utf8[/utf8mb3] 数据升级时无需担心转换字符或丢失数据MySQL。

So if you want your column to support storing characters lying outside the BMP (and you usually want to), such as emoji, use "utf8mb4". See also What are the most common non-BMP Unicode characters in actual use?.

因此,如果您希望您的列支持存储位于 BMP 之外的字符(并且您通常希望这样做),例如emoji,请使用“utf8mb4”。另请参阅实际使用中最常见的非 BMP Unicode 字符是什么?.

回答by Jimmy Kane

The utf8mb4character set is useful because nowadays we need support for storing not only language characters but also symbols, newly introduced emojis, and so on.

utf8mb4字符集是有用的,因为现在我们需要为存储不仅语言文字,而且是符号,新引进的表情图案,等支持。

A nice read on How to support full Unicode in MySQL databasesby Mathias Bynens can also shed some light on this.

Mathias Bynens关于How to support full Unicode in MySQL databases的一篇不错的文章也可以对此有所了解。

回答by simhumileco

Taken from the MySQL 8.0 Reference Manual:

摘自MySQL 8.0 参考手册

  • utf8mb4: A UTF-8encoding of the Unicodecharacter set using one to four bytesper character.

  • utf8mb3: A UTF-8encoding of the Unicodecharacter set using one to three bytesper character.

  • utf8mb4: Unicode字符集的UTF-8编码,每个字符使用一到四个字节

  • utf8mb3: Unicode字符集的UTF-8编码,每个字符使用一到三个字节

In MySQLutf8is currently an alias for utf8mb3which is deprecatedand will be removed in a future MySQLrelease. At that point utf8will become a reference toutf8mb4.

MySQLutf8中目前是一个别名,utf8mb3已被弃用,并将在未来的MySQL版本中删除。那时utf8将成为对 的引用utf8mb4

So regardless of this alias, you can consciously set yourself an utf8mb4encoding.

所以不管这个别名,你都可以有意识地给自己设置一个utf8mb4编码。

To complete the answer, I'd like to add the @WilliamEntriken'scomment below(also taken from the manual):

为了完成答案,我想在下面添加@WilliamEntriken 的评论(也取自手册):

To avoid ambiguity about the meaning of utf8, consider specifying utf8mb4explicitly for character set references instead of utf8.

为避免 的含义含糊不清utf8,请考虑utf8mb4明确指定字符集引用而不是utf8