MySQL utf8_general_ci 和 utf8_unicode_ci 有什么区别？

Question

提问by reconbot

Possible Duplicate:
What's the difference between utf8_general_ci and utf8_unicode_ci

可能重复：
utf8_general_ci 和 utf8_unicode_ci 有什么区别

I've got two options for unicode that look promising for a mysql database.

我有两个 unicode 选项，它们看起来对 mysql 数据库很有希望。

utf8_general_ci unicode (multilingual), case-insensitive
utf8_unicode_ci unicode (multilingual), case-insensitive

Can you please explain what is the difference between utf8_general_ci and utf8_unicode_ci? What are the effects of choosing one over the other when designing a database?

你能解释一下 utf8_general_ci 和 utf8_unicode_ci 有什么区别吗？在设计数据库时选择一个而不是另一个有什么影响？

Answer 1

回答by Timotei

utf8_general_ciis a very simple — and on Unicode, very broken — collation, one that gives incorrect resultson general Unicode text. What it does is:

utf8_general_ci是一个非常简单的 - 在 Unicode 上，非常破碎 - 排序规则，它在一般 Unicode 文本上给出错误的结果。它的作用是：

converts to Unicode normalization form D for canonical decomposition
removes any combining characters
converts to upper case

转换为 Unicode 规范化形式 D 以进行规范分解
删除任何组合字符
转换为大写

This does not work correctly on Unicode, because it does not understand Unicode casing. Unicode casing alone is much more complicated than an ASCII-minded approach can handle. For example:

这在 Unicode 上不能正常工作，因为它不理解 Unicode 大小写。Unicode 大小写本身比 ASCII 方法可以处理的要复杂得多。例如：

The lowercase of “?” is “?”, but the uppercase of “?” is “SS”.
There are two lowercase Greek sigmas, but only one uppercase one; consider “Σ?συφο?”.
Letters like “?” do not decompose to an “o” plus a diacritic, meaning that it won't correctly sort.

“?”的小写是“？”，但是“？”的大写是“SS”。
有两个小写的希腊西格玛，但只有一个大写的；考虑“Σ?συφο?”。
像“？”这样的字母不要分解为“o”加上变音符号，这意味着它不会正确排序。

There are many other subtleties.

还有许多其他微妙之处。

utf8_unicode_ciuses the standard Unicode Collation Algorithm, supports so called expansions and ligatures, for example: German letter ? (U+00DF LETTER SHARP S) is sorted near "ss" Letter ? (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE".

utf8_unicode_ci使用标准的Unicode Collation Algorithm，支持所谓的扩展和连字，例如： German letter ? (U+00DF LETTER SHARP S) 在“ss”字母附近排序？(U+0152 LATIN CAPITAL LIGATURE OE) 排序在“OE”附近。

utf8_general_cidoes not support expansions/ligatures, it sorts all these letters as single characters, and sometimes in a wrong order.

utf8_general_ci不支持扩展/连字，它将所有这些字母排序为单个字符，有时排序错误。

utf8_unicode_ciis generallymore accurate for all scripts. For example, on Cyrillic block: utf8_unicode_ciis fine for all these languages: Russian, Bulgarian, Belarusian, Macedonian, Serbian, and Ukrainian. While utf8_general_ci is fine only for Russian and Bulgarian subset of Cyrillic. Extra letters used in Belarusian, Macedonian, Serbian, and Ukrainian are sorted not well.

utf8_unicode_ci是一般所有脚本更准确。例如，在 Cyrillic 块上： utf8_unicode_ci适用于所有这些语言：俄语、保加利亚语、白俄罗斯语、马其顿语、塞尔维亚语和乌克兰语。虽然 utf8_general_ci 仅适用于西里尔文的俄语和保加利亚语子集。白俄罗斯语、马其顿语、塞尔维亚语和乌克兰语中使用的额外字母排序不好。

The cost of utf8_unicode_ciis that it is a littlebit slower than utf8_general_ci. But that's the price you pay for correctness. Either you can have a fast answer that's wrong, or a very slightly slower answer that's right. Your choice. It is very difficult to ever justify giving wrong answers, so it's best to assume that utf8_general_cidoesn't exist and to always use utf8_unicode_ci. Well, unless you want wrong answers.

成本utf8_unicode_ci是，它是一个稍微有点慢utf8_general_ci。但这就是您为正确性付出的代价。您可以有一个快速的错误答案，或者一个非常慢的正确答案。你的选择。很难证明给出错误答案是正确的，因此最好假设它utf8_general_ci不存在并始终使用utf8_unicode_ci. 好吧，除非你想要错误的答案。

Source: http://forums.mysql.com/read.php?103,187048,188748#msg-188748

来源：http: //forums.mysql.com/read.php?103,187048,188748#msg-188748

Answer 2

回答by Gumbo

From Unicode Character Setsin the MySQL documentation:

来自MySQL 文档中的Unicode 字符集：

For any Unicode character set, operations performed using the _general_cicollation are faster than those for the _unicode_cicollation. For example, comparisons for the utf8_general_cicollation are faster, but slightly less correct, than comparisons for utf8_unicode_ci. The reason for this is that utf8_unicode_cisupports mappings such as expansions; that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages “?” is equal to “ss”. utf8_unicode_cialso supports contractions and ignorable characters. utf8_general_ciis a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.

对于任何 Unicode 字符集，使用_general_ci排序规则执行的操作比使用排序规则执行的操作更快_unicode_ci。例如，utf8_general_ci归类的比较比的比较快，但正确性稍差utf8_unicode_ci。这样做的原因是utf8_unicode_ci支持扩展等映射；也就是说，当一个字符与其他字符的组合相等时。例如，在德语和其他一些语言中，“ ?” 等于“ ss”。utf8_unicode_ci还支持收缩和可忽略的字符。utf8_general_ci是不支持扩展、收缩或可忽略字符的旧排序规则。它只能在字符之间进行一对一的比较。

MySQL utf8_general_ci 和 utf8_unicode_ci 有什么区别？

提问by reconbot

回答by Timotei

回答by Gumbo

相关推荐

最近更新

标签

MySQL utf8_general_ci 和 utf8_unicode_ci 有什么区别？

提问by reconbot

回答by Timotei

回答by Gumbo

相关推荐

LOAD DATA LOCAL INFILE 给出错误 The used command is not allowed with this MySQL version

MySQL 查找两个经纬度点之间距离的最快方法

将查询结果设置为 MySQL 中的变量

排序规则的非法混合 MySQL 错误

相关推荐

最近更新

标签