MariaDB/MySQL 中 utf8mb4_unicode_ci 和 utf8mb4_unicode_520_ci 排序规则之间的区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37307146/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between utf8mb4_unicode_ci and utf8mb4_unicode_520_ci collations in MariaDB/MySQL?
提问by Flux
I logged into MariaDB/MySQL and entered:
我登录到 MariaDB/MySQL 并输入:
SHOW COLLATION;
I see utf8mb4_unicode_ci
and utf8mb4_unicode_520_ci
among the available collations. What is the difference between these two collations and which should we be using?
我在可用的排序规则中看到utf8mb4_unicode_ci
和utf8mb4_unicode_520_ci
。这两个排序规则之间有什么区别,我们应该使用哪个?
回答by StuiterSlurf
Well you shall need to read in to the documentation. I can't tell you what you should be using because every project is different.
那么你需要阅读文档。我无法告诉您应该使用什么,因为每个项目都不同。
10.1.3 Collation Naming Conventions
10.1.3 排序规则命名约定
MySQL collation names follow these conventions:
MySQL 排序规则名称遵循以下约定:
A collation name starts with the name of the character set with which it is associated, followed by one or more suffixes indicating other collation characteristics. For example, utf8_general_ci and latin_swedish_ci are collations for the utf8 and latin1 character sets, respectively.
排序规则名称以与其关联的字符集的名称开头,后跟一个或多个表示其他排序规则特征的后缀。例如,utf8_general_ci 和 latin_swedish_ci 分别是 utf8 和 latin1 字符集的排序规则。
A language-specific collation includes a language name. For example, utf8_turkish_ci and utf8_hungarian_ci sort characters for the utf8 character set using the rules of Turkish and Hungarian, respectively.
特定于语言的归类包括语言名称。例如,utf8_turkish_ci 和 utf8_hungarian_ci 分别使用土耳其语和匈牙利语的规则对 utf8 字符集的字符进行排序。
Case sensitivity for sorting is indicated by _ci (case insensitive), _cs (case sensitive), or _bin (binary; character comparisons are based on character binary code values). For example, latin1_general_ci is case insensitive, latin1_general_cs is case sensitive, and latin1_bin uses binary code values.
排序时区分大小写由 _ci(不区分大小写)、_cs(区分大小写)或 _bin(二进制;字符比较基于字符二进制代码值)表示。例如,latin1_general_ci 不区分大小写,latin1_general_cs 区分大小写,latin1_bin 使用二进制代码值。
For Unicode, collation names may include a version number to indicate the version of the Unicode Collation Algorithm (UCA) on which the collation is based. UCA-based collations without a version number in the name use the version-4.0.0 UCA weight keys. For example:
对于 Unicode,排序规则名称可能包含一个版本号,以指示排序规则所基于的 Unicode 排序规则算法 (UCA) 的版本。名称中没有版本号的基于 UCA 的排序规则使用版本 4.0.0 UCA 权重键。例如:
utf8_unicode_ci (with no version named) is based on UCA 4.0.0 weight keys >(http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt).
utf8_unicode_ci(未命名版本)基于 UCA 4.0.0 权重键 >(http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt)。
utf8_unicode_520_ci is based on UCA 5.2.0 weight keys (http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt).
utf8_unicode_520_ci 基于 UCA 5.2.0 权重键 ( http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt)。
For Unicode, the xxx_general_mysql500_ci collations preserve the pre-5.1.24 ordering of the original xxx_general_ci collations and permit upgrades for tables created before MySQL 5.1.24. For more information, see Section 2.11.3, “Checking Whether Tables or Indexes Must Be Rebuilt”, and Section 2.11.4, “Rebuilding or Repairing Tables or Indexes”.
对于 Unicode,xxx_general_mysql500_ci 排序规则保留原始 xxx_general_ci 排序规则的 5.1.24 之前的顺序,并允许对 MySQL 5.1.24 之前创建的表进行升级。有关更多信息,请参阅第 2.11.3 节“检查是否必须重建表或索引”和第 2.11.4 节“重建或修复表或索引”。
Source : https://dev.mysql.com/doc/refman/5.6/en/charset-collation-names.html
来源:https: //dev.mysql.com/doc/refman/5.6/en/charset-collation-names.html
回答by Kamil Kie?czewski
I will develop @StuiterSlurfanswer and focus on details of utf8mb4_unicode_ci
/utf8mb4_unicode_520_ci
:
我将开发@StuiterSlurf答案并关注utf8mb4_unicode_ci
/ 的细节utf8mb4_unicode_520_ci
:
As you can read here(Peter Gulutzan) there is problem with sorting/comparing polish letter "?" (L with stroke) (lower case: "?"; html esc: ł
and Ł
) - we have following assumption in coding (same with mb4
):
正如您在此处阅读的那样(Peter Gulutzan),排序/比较波兰语字母“?”有问题。(L 带中风)(小写:“?”;html esc:ł
和Ł
) - 我们在编码中有以下假设(与 相同mb4
):
utf8_polish_ci ? greater than L and less than M
utf8_unicode_ci ? greater than L and less than M
utf8_unicode_520_ci ? equal to L
utf8_general_ci ? greater than Z
In polish language letter ? is after letter L and before M. And for different coding system you will get different sorting results. No one of this coding is better or worse - it depends of your needs.
在波兰语字母 ? 在字母 L 之后,在 M 之前。对于不同的编码系统,您将得到不同的排序结果。这些编码没有一个更好或更坏 - 这取决于您的需求。
回答by Peter Gulutzan
To see a bit more discussion of the actual differences, you can go to https://dev.mysql.com/worklog/task/?id=2673and click "High Level Architecture".
要查看有关实际差异的更多讨论,您可以访问https://dev.mysql.com/worklog/task/?id=2673并单击“高级架构”。