什么是德语最好的 MySQL 排序规则
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5526169/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the best MySQL collation for German language
提问by TooCooL
I am building a web site in German language, So I will be using characters like ?, ü, ?
etc., So what are your recommendations?
我正在用德语建立一个网站,所以我将使用诸如?, ü, ?
等的字符,那么您有什么建议?
回答by Pekka
This answer is outdated. For full emoji support, see this answer.
这个答案已经过时了。有关完整的表情符号支持,请参阅此答案。
As the character set, if you can, definitely UTF-8.
至于字符集,如果可以的话,肯定是UTF-8。
As the collation - that's a bit nasty for languages with special characters. There are various types of collations. They can all store all Umlauts and other characters, but they differ in how they treat Umlauts in comparisons, i.e. whether
作为排序规则 - 对于具有特殊字符的语言来说,这有点令人讨厌。有多种类型的排序规则。它们都可以存储所有变音符号和其他字符,但它们在比较中对待变音符号的方式有所不同,即是否
u = ü
is true or false; and in sorting (where in the alphabets the Umlauts are located in the sorting order).
是真的还是假的;和排序(在字母表中,元音字母在排序中的位置)。
To make a long story short, your best bet is either
长话短说,你最好的选择是
utf8_unicode_ci
utf8_unicode_ci
It allows case insensitive searches; It treats ?
as ss
and uses DIN-1 sorting. Sadly, like all non-binary Unicode collations, it treats u = ü
which is a terrible nuisance because a search for "Muller" will also return "Müller". You will have to work around that by setting a Umlaut-aware collation in real time.
它允许不区分大小写的搜索;它把?
作为ss
并使用DIN-1分选。可悲的是,像所有非二进制 Unicode 排序规则一样,它认为u = ü
这是一个可怕的麻烦,因为搜索“Muller”也会返回“Müller”。您必须通过实时设置变音感知归类来解决这个问题。
or utf8_bin
或者 utf8_bin
This collation does not have the u = ü
problem but only case sensitive searches are possible.
此排序规则没有u = ü
问题,但只能进行区分大小写的搜索。
I'm not entirely sure whether there are any other side effects to using the binary collation; I asked a question about that here.
我不完全确定使用二进制排序规则是否有任何其他副作用;我在这里问了一个问题。
This mySQL manual pagegives a good overview over the various collations and the consequences they bring in everyday use.
这个 mySQL 手册页很好地概述了各种排序规则及其在日常使用中带来的后果。
Hereis a general overview on available collations in mySQL.
以下是有关 mySQL 中可用排序规则的一般概述。
回答by Roland
To support the complete UTF-8 standardyou have to use the charset utf8mb4
and the collation utf8mb4_unicode_ci
in MySQL!
要支持完整的 UTF-8 标准,您必须使用MySQL 中的字符集utf8mb4
和排序规则utf8mb4_unicode_ci
!
Note:MySQL only supports 1- to 3-byte characters when using its so called utf8
charset! This is why the modern Emojis are not supported as they use 4 Bytes!
注意:当使用所谓的utf8
字符集时,MySQL 仅支持 1 到 3 个字节的字符!这就是为什么不支持现代表情符号的原因,因为它们使用 4 字节!
The only way to fully support the UTF-8 standard is to change the charset and collation of ALL tablesand of the databaseitself to utf8mb4
and utf8mb4_unicode_ci
. Further more, the database connectionneeds to use utf8mb4 as well.
完全支持 UTF-8 标准的唯一方法是将所有表和数据库本身的字符集和排序规则更改为utf8mb4
和utf8mb4_unicode_ci
。此外,数据库连接也需要使用utf8mb4。
The mysql server must use utf8mb4 as default charset which can be manually configured in /etc/mysql/conf.d/mysql.cnf
mysql服务器必须使用utf8mb4作为默认字符集,可以在/etc/mysql/conf.d/mysql.cnf中手动配置
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
# character-set-client-handshake = FALSE ## better not set this!
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
Existing tables can be migrated to utf8mb4 using the following SQL statement:
可以使用以下 SQL 语句将现有表迁移到 utf8mb4:
ALTER TABLE <table-name> CONVERT TO
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;
Note:
笔记:
- To make sure any JOINs between table-colums will not be slowed down by charset-encodings ALL tables have to be change!
- As the length of an index is limited in MySQL, the total number of characters per index-row must be multiplied by 4 Byte and need to be smaller than 3072
- 为了确保表列之间的任何 JOIN 不会被字符集编码减慢,所有表都必须更改!
- 由于 MySQL 对索引的长度有限制,因此每个索引行的总字符数必须乘以 4 Byte,并且需要小于 3072
When the innodb_large_prefix configuration option is enabled, this length limit is raised to 3072 bytes, for InnoDB tables that use the DYNAMIC and COMPRESSED row formats.
当启用 innodb_large_prefix 配置选项时,对于使用 DYNAMIC 和 COMPRESSED 行格式的 InnoDB 表,此长度限制提高到 3072 字节。
To change the charset and default collation of the database, run this command:
要更改数据库的字符集和默认排序规则,请运行以下命令:
ALTER DATABASE CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Since utf8mb4 is fully backwards compatible with utf8, no mojibake or other forms of data loss should occur.
由于 utf8mb4 与 utf8 完全向后兼容,因此不应发生 mojibake 或其他形式的数据丢失。
回答by Sandro Munda
utf-8-general-ci
or utf-8-unicode-ci
.
utf-8-general-ci
或utf-8-unicode-ci
。
To know the difference : UTF-8: General? Bin? Unicode?
要知道区别: UTF-8: 通用?斌?统一码?