什么 MySQL 排序规则最适合接受所有 unicode 字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14329314/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What MySQL collation is best for accepting all unicode characters?
提问by HellaMad
Our column is currently collated to latin1_swedish_ci
and special unicode characters are, obviously, getting stripped out. We want to be able to accept chars such as U+272A ?
, U+2764 ?
, (see this wikipedia article) etc. I'm leaning towards utf8_unicode_ci
, would this collation handle these and other characters? I don't care about speed as this column isn't an index.
我们的专栏目前正在整理latin1_swedish_ci
,很明显,特殊的 unicode 字符正在被删除。我们希望能够接受诸如U+272A ?
, U+2764 ?
, (请参阅这篇维基百科文章)等字符。我倾向于utf8_unicode_ci
,这种排序规则可以处理这些字符和其他字符吗?我不在乎速度,因为该列不是索引。
MySQL Version: 5.5.28-1
MySQL 版本:5.5.28-1
回答by deceze
The collationis the least of your worries, what you need to think about is the character setfor the column/table/database. The collation (rules governing how data is comparedand sorted) is just a corollary of that.
该整理是你最担心,你需要考虑的是什么样的字符集的列/表/数据库。排序规则(控制数据如何比较和排序的规则)只是它的一个推论。
MySQL supports several Unicode character sets, utf8
and utf8mb4
being the most interesting. utf8
supports Unicode characters in the BMP, i.e. a subset of all of Unicode. utf8mb4
, available since MySQL 5.5.3, supports allof Unicode.
MySQL 支持多种 Unicode 字符集,utf8
并且utf8mb4
是最有趣的。utf8
支持BMP中的 Unicode 字符,即所有 Unicode 的子集。utf8mb4
,自 MySQL 5.5.3 起可用,支持所有Unicode。
The collationto be used with any of the Unicode encodings is most likely xxx_general_ci
or xxx_unicode_ci
. The former is a general sorting and comparison algorithm independent of language, the latter is a more completelanguage independent algorithm supporting more Unicode features (e.g. treating "?" and "ss" as equivalent), but is therefore also slower.
与任何 Unicode 编码一起使用的排序规则很可能是xxx_general_ci
或xxx_unicode_ci
。前者是一种通用的独立于语言的排序和比较算法,后者是一种更完整的语言独立算法,支持更多的Unicode特性(例如将“?”和“ss”视为等价的),但速度也较慢。
See https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html.
请参阅https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html。