MySQL 哪种 utf8 排序规则最好?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2703578/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 15:52:35  来源:igfitidea点击:

Which of utf8 collations is the best?

mysqlcollation

提问by armin etemadi

I want a UTF8 collation for supporting:

我想要一个 UTF8 排序规则来支持:

  • English
  • Persian
  • Arabic
  • French
  • Japanese
  • Chinese
  • 英语
  • 波斯语
  • 阿拉伯
  • 法语
  • 日本人
  • china人

Does UTF8_GENERAL_CIsupport all these Languages?

是否UTF8_GENERAL_CI支持所有这些语言?

回答by knittl

Yes, that is correct. UTF-8 is an encoding for the Unicode character set, which supports pretty much every language in the world.

对,那是正确的。UTF-8 是 Unicode 字符集的编码,它支持世界上几乎所有的语言。

I think the only difference comes with sorting your results, different letters might come in a different order in other languages (accents, umlauts, etc.). Also, comparing ato ?might behave differently in another collation.

我认为唯一的区别在于对结果进行排序,不同的字母在其他语言中的顺序可能不同(口音、变音等)。此外,比较a?行为可能不同的另一个排序规则。

The _cisuffix means sorting and comparison happens case insensitive.

所述_ci后缀装置排序和比较发生ÇASEnsensitive。

http://www.collation-charts.org/might be of interest to you.

http://www.collat​​ion-charts.org/ 您可能会感兴趣。

回答by Aistis

As UTF8_GENERAL_CIwas a good decision some time ago. It has some drawbacks now.

由于UTF8_GENERAL_CI是一个很好的决定,前一段时间。它现在有一些缺点。

MySQL's UTF8 actually uses 3 bytes instead of 4, which you need for symbols like emojis and new asian chars.

MySQL 的 UTF8 实际上使用 3 个字节而不是 4 个字节,这是你需要的符号,如表情符号和新的亚洲字符。

So MySQL has a newer charset called utf8mb4which actually complies with UTF8 definition.

所以 MySQL 有一个名为utf8mb4的新字符集,它实际上符合 UTF8 定义。

To be able fully support Asian languages you will need to choose utf8mb4.

为了能够完全支持亚洲语言,您需要选择 utf8mb4。

If you care about correct sorting in multiple languages, use utf8mb4_unicodeor utf8mb4_unicode_ciinstead general.

如果您关心多种语言的正确排序,请使用utf8mb4_unicodeutf8mb4_unicode_ci代替general。

A more detailed answer you can find in What's the difference between utf8_general_ci and utf8_unicode_ci

您可以在 utf8_general_ci 和 utf8_unicode_ci 的区别什么中找到更详细的答案