MariaDB/MySQL 中 utf8mb4_unicode_ci 和 utf8mb4_unicode_520_ci 排序规则之间的区别？

Question

提问by Flux

I logged into MariaDB/MySQL and entered:

我登录到 MariaDB/MySQL 并输入：

SHOW COLLATION;

I see utf8mb4_unicode_ciand utf8mb4_unicode_520_ciamong the available collations. What is the difference between these two collations and which should we be using?

我在可用的排序规则中看到utf8mb4_unicode_ci和utf8mb4_unicode_520_ci。这两个排序规则之间有什么区别，我们应该使用哪个？

Answer 1

回答by StuiterSlurf

Well you shall need to read in to the documentation. I can't tell you what you should be using because every project is different.

那么你需要阅读文档。我无法告诉您应该使用什么，因为每个项目都不同。

10.1.3 Collation Naming Conventions

10.1.3 排序规则命名约定

MySQL collation names follow these conventions:

MySQL 排序规则名称遵循以下约定：

A collation name starts with the name of the character set with which it is associated, followed by one or more suffixes indicating other collation characteristics. For example, utf8_general_ci and latin_swedish_ci are collations for the utf8 and latin1 character sets, respectively.

排序规则名称以与其关联的字符集的名称开头，后跟一个或多个表示其他排序规则特征的后缀。例如，utf8_general_ci 和 latin_swedish_ci 分别是 utf8 和 latin1 字符集的排序规则。

A language-specific collation includes a language name. For example, utf8_turkish_ci and utf8_hungarian_ci sort characters for the utf8 character set using the rules of Turkish and Hungarian, respectively.

特定于语言的归类包括语言名称。例如，utf8_turkish_ci 和 utf8_hungarian_ci 分别使用土耳其语和匈牙利语的规则对 utf8 字符集的字符进行排序。

Case sensitivity for sorting is indicated by _ci (case insensitive), _cs (case sensitive), or _bin (binary; character comparisons are based on character binary code values). For example, latin1_general_ci is case insensitive, latin1_general_cs is case sensitive, and latin1_bin uses binary code values.

排序时区分大小写由 _ci（不区分大小写）、_cs（区分大小写）或 _bin（二进制；字符比较基于字符二进制代码值）表示。例如，latin1_general_ci 不区分大小写，latin1_general_cs 区分大小写，latin1_bin 使用二进制代码值。

For Unicode, collation names may include a version number to indicate the version of the Unicode Collation Algorithm (UCA) on which the collation is based. UCA-based collations without a version number in the name use the version-4.0.0 UCA weight keys. For example:

对于 Unicode，排序规则名称可能包含一个版本号，以指示排序规则所基于的 Unicode 排序规则算法 (UCA) 的版本。名称中没有版本号的基于 UCA 的排序规则使用版本 4.0.0 UCA 权重键。例如：

utf8_unicode_ci (with no version named) is based on UCA 4.0.0 weight keys >(http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt).

utf8_unicode_ci（未命名版本）基于 UCA 4.0.0 权重键 >（http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt）。

utf8_unicode_520_ci is based on UCA 5.2.0 weight keys (http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt).

utf8_unicode_520_ci 基于 UCA 5.2.0 权重键 ( http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt)。

For Unicode, the xxx_general_mysql500_ci collations preserve the pre-5.1.24 ordering of the original xxx_general_ci collations and permit upgrades for tables created before MySQL 5.1.24. For more information, see Section 2.11.3, “Checking Whether Tables or Indexes Must Be Rebuilt”, and Section 2.11.4, “Rebuilding or Repairing Tables or Indexes”.

对于 Unicode，xxx_general_mysql500_ci 排序规则保留原始 xxx_general_ci 排序规则的 5.1.24 之前的顺序，并允许对 MySQL 5.1.24 之前创建的表进行升级。有关更多信息，请参阅第 2.11.3 节“检查是否必须重建表或索引”和第 2.11.4 节“重建或修复表或索引”。

Source : https://dev.mysql.com/doc/refman/5.6/en/charset-collation-names.html

来源：https: //dev.mysql.com/doc/refman/5.6/en/charset-collation-names.html

Answer 2

回答by Kamil Kie?czewski

I will develop @StuiterSlurfanswer and focus on details of utf8mb4_unicode_ci/utf8mb4_unicode_520_ci:

我将开发@StuiterSlurf答案并关注utf8mb4_unicode_ci/ 的细节utf8mb4_unicode_520_ci：

As you can read here(Peter Gulutzan) there is problem with sorting/comparing polish letter "?" (L with stroke) (lower case: "?"; html esc: łand Ł) - we have following assumption in coding (same with mb4):

正如您在此处阅读的那样（Peter Gulutzan），排序/比较波兰语字母“？”有问题。（L 带中风）（小写：“？”；html esc:ł和Ł） - 我们在编码中有以下假设（与相同mb4）：

utf8_polish_ci      ? greater than L and less than M
utf8_unicode_ci     ? greater than L and less than M
utf8_unicode_520_ci ? equal to L
utf8_general_ci     ? greater than Z

In polish language letter ? is after letter L and before M. And for different coding system you will get different sorting results. No one of this coding is better or worse - it depends of your needs.

在波兰语字母 ? 在字母 L 之后，在 M 之前。对于不同的编码系统，您将得到不同的排序结果。这些编码没有一个更好或更坏 - 这取决于您的需求。

Answer 3

回答by Peter Gulutzan

To see a bit more discussion of the actual differences, you can go to https://dev.mysql.com/worklog/task/?id=2673and click "High Level Architecture".

要查看有关实际差异的更多讨论，您可以访问https://dev.mysql.com/worklog/task/?id=2673并单击“高级架构”。

MariaDB/MySQL 中 utf8mb4_unicode_ci 和 utf8mb4_unicode_520_ci 排序规则之间的区别？

提问by Flux

回答by StuiterSlurf

10.1.3 Collation Naming Conventions

10.1.3 排序规则命名约定

回答by Kamil Kie?czewski

回答by Peter Gulutzan

相关推荐

最近更新

标签

MariaDB/MySQL 中 utf8mb4_unicode_ci 和 utf8mb4_unicode_520_ci 排序规则之间的区别？

提问by Flux

回答by StuiterSlurf

10.1.3 Collation Naming Conventions

10.1.3 排序规则命名约定

回答by Kamil Kie?czewski

回答by Peter Gulutzan

相关推荐

如何在 Mac 上更新使用 Homebrew 安装的 MySQL

如何生成随机字符并使用 MySQL 插入？

MySQL 安装过程中出现 phpmyadmin 错误“指定了‘端口’的空值。”

MySQL 禁用和启用密钥

相关推荐

最近更新

标签