什么 MySQL 排序规则最适合接受所有 unicode 字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14329314/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 16:10:54  来源:igfitidea点击:

What MySQL collation is best for accepting all unicode characters?

mysqlcollation

提问by HellaMad

Our column is currently collated to latin1_swedish_ciand special unicode characters are, obviously, getting stripped out. We want to be able to accept chars such as U+272A ?, U+2764 ?, (see this wikipedia article) etc. I'm leaning towards utf8_unicode_ci, would this collation handle these and other characters? I don't care about speed as this column isn't an index.

我们的专栏目前正在整理latin1_swedish_ci,很明显,特殊的 unicode 字符正在被删除。我们希望能够接受诸如U+272A ?, U+2764 ?, (请参阅这篇维基百科文章)等字符。我倾向于utf8_unicode_ci,这种排序规则可以处理这些字符和其他字符吗?我不在乎速度,因为该列不是索引。

MySQL Version: 5.5.28-1

MySQL 版本:5.5.28-1

回答by deceze

The collationis the least of your worries, what you need to think about is the character setfor the column/table/database. The collation (rules governing how data is comparedand sorted) is just a corollary of that.

整理是你最担心,你需要考虑的是什么样的字符集的列/表/数据库。排序规则(控制数据如何比较排序的规则)只是它的一个推论。

MySQL supports several Unicode character sets, utf8and utf8mb4being the most interesting. utf8supports Unicode characters in the BMP, i.e. a subset of all of Unicode. utf8mb4, available since MySQL 5.5.3, supports allof Unicode.

MySQL 支持多种 Unicode 字符集,utf8并且utf8mb4是最有趣的。utf8支持BMP中的 Unicode 字符,即所有 Unicode 的子集。utf8mb4,自 MySQL 5.5.3 起可用,支持所有Unicode。

The collationto be used with any of the Unicode encodings is most likely xxx_general_cior xxx_unicode_ci. The former is a general sorting and comparison algorithm independent of language, the latter is a more completelanguage independent algorithm supporting more Unicode features (e.g. treating "?" and "ss" as equivalent), but is therefore also slower.

与任何 Unicode 编码一起使用的排序规则很可能是xxx_general_cixxx_unicode_ci。前者是一种通用的独立于语言的排序和比较算法,后者是一种更完整的语言独立算法,支持更多的Unicode特性(例如将“?”和“ss”视为等价的),但速度也较慢。

See https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html.

请参阅https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html