MySQL utf8_general_ci 和 utf8_unicode_ci 有什么区别?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1036454/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 13:34:33  来源:igfitidea点击:

What are the differences between utf8_general_ci and utf8_unicode_ci?

mysqlunicodecharacter-encoding

提问by reconbot

Possible Duplicate:
What's the difference between utf8_general_ci and utf8_unicode_ci

可能重复:
utf8_general_ci 和 utf8_unicode_ci 有什么区别

I've got two options for unicode that look promising for a mysql database.

我有两个 unicode 选项,它们看起来对 mysql 数据库很有希望。

utf8_general_ci unicode (multilingual), case-insensitive
utf8_unicode_ci unicode (multilingual), case-insensitive

Can you please explain what is the difference between utf8_general_ci and utf8_unicode_ci? What are the effects of choosing one over the other when designing a database?

你能解释一下 utf8_general_ci 和 utf8_unicode_ci 有什么区别吗?在设计数据库时选择一个而不是另一个有什么影响?

回答by Timotei

utf8_general_ciis a very simple — and on Unicode, very broken — collation, one that gives incorrect resultson general Unicode text. What it does is:

utf8_general_ci是一个非常简单的 - 在 Unicode 上,非常破碎 - 排序规则,它在一般 Unicode 文本上给出错误的结果。它的作用是:

  • converts to Unicode normalization form D for canonical decomposition
  • removes any combining characters
  • converts to upper case
  • 转换为 Unicode 规范化形式 D 以进行规范分解
  • 删除任何组合字符
  • 转换为大写

This does not work correctly on Unicode, because it does not understand Unicode casing. Unicode casing alone is much more complicated than an ASCII-minded approach can handle. For example:

这在 Unicode 上不能正常工作,因为它不理解 Unicode 大小写。Unicode 大小写本身比 ASCII 方法可以处理的要复杂得多。例如:

  • The lowercase of “?” is “?”, but the uppercase of “?” is “SS”.
  • There are two lowercase Greek sigmas, but only one uppercase one; consider “Σ?συφο?”.
  • Letters like “?” do not decompose to an “o” plus a diacritic, meaning that it won't correctly sort.
  • “?”的小写 是“?”,但是“?”的大写 是“SS”。
  • 有两个小写的希腊西格玛,但只有一个大写的;考虑“Σ?συφο?”。
  • 像“?”这样的字母 不要分解为“o”加上变音符号,这意味着它不会正确排序。

There are many other subtleties.

还有许多其他微妙之处。

  1. utf8_unicode_ciuses the standard Unicode Collation Algorithm, supports so called expansions and ligatures, for example: German letter ? (U+00DF LETTER SHARP S) is sorted near "ss" Letter ? (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE".
  1. utf8_unicode_ci使用标准的Unicode Collat​​ion Algorithm,支持所谓的扩展和连字,例如: German letter ? (U+00DF LETTER SHARP S) 在“ss”字母附近排序?(U+0152 LATIN CAPITAL LIGATURE OE) 排序在“OE”附近。

utf8_general_cidoes not support expansions/ligatures, it sorts all these letters as single characters, and sometimes in a wrong order.

utf8_general_ci不支持扩展/连字,它将所有这些字母排序为单个字符,有时排序错误。

  1. utf8_unicode_ciis generallymore accurate for all scripts. For example, on Cyrillic block: utf8_unicode_ciis fine for all these languages: Russian, Bulgarian, Belarusian, Macedonian, Serbian, and Ukrainian. While utf8_general_ci is fine only for Russian and Bulgarian subset of Cyrillic. Extra letters used in Belarusian, Macedonian, Serbian, and Ukrainian are sorted not well.
  1. utf8_unicode_ci一般所有脚本更准确。例如,在 Cyrillic 块上: utf8_unicode_ci适用于所有这些语言:俄语、保加利亚语、白俄罗斯语、马其顿语、塞尔维亚语和乌克兰语。虽然 utf8_general_ci 仅适用于西里尔文的俄语和保加利亚语子集。白俄罗斯语、马其顿语、塞尔维亚语和乌克兰语中使用的额外字母排序不好。

The cost of utf8_unicode_ciis that it is a littlebit slower than utf8_general_ci. But that's the price you pay for correctness. Either you can have a fast answer that's wrong, or a very slightly slower answer that's right. Your choice. It is very difficult to ever justify giving wrong answers, so it's best to assume that utf8_general_cidoesn't exist and to always use utf8_unicode_ci. Well, unless you want wrong answers.

成本utf8_unicode_ci是,它是一个稍微有点慢utf8_general_ci。但这就是您为正确性付出的代价。您可以有一个快速的错误答案,或者一个非常慢的正确答案。你的选择。很难证明给出错误答案是正确的,因此最好假设它utf8_general_ci不存在并始终使用utf8_unicode_ci. 好吧,除非你想要错误的答案。

Source: http://forums.mysql.com/read.php?103,187048,188748#msg-188748

来源:http: //forums.mysql.com/read.php?103,187048,188748#msg-188748

回答by Gumbo

From Unicode Character Setsin the MySQL documentation:

来自MySQL 文档中的Unicode 字符集

For any Unicode character set, operations performed using the _general_cicollation are faster than those for the _unicode_cicollation. For example, comparisons for the utf8_general_cicollation are faster, but slightly less correct, than comparisons for utf8_unicode_ci. The reason for this is that utf8_unicode_cisupports mappings such as expansions; that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages “?” is equal to “ss”. utf8_unicode_cialso supports contractions and ignorable characters. utf8_general_ciis a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.

对于任何 Unicode 字符集,使用_general_ci排序规则执行的操作比使用排序规则执行的操作更快_unicode_ci。例如,utf8_general_ci归类的比较比 的比较快,但正确性稍差utf8_unicode_ci。这样做的原因是utf8_unicode_ci支持扩展等映射;也就是说,当一个字符与其他字符的组合相等时。例如,在德语和其他一些语言中,“ ?” 等于“ ss”。utf8_unicode_ci还支持收缩和可忽略的字符。utf8_general_ci是不支持扩展、收缩或可忽略字符的旧排序规则。它只能在字符之间进行一对一的比较。