为什么 MySQL 使用 latin1_swedish_ci 作为默认值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3936059/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 17:26:37  来源:igfitidea点击:

Why does MySQL use latin1_swedish_ci as the default?

mysqlencoding

提问by Metropolis

Does anyone know why latin1_swedish is the default for MySQL. It would seem to me that UTF-8 would be more compatible right?

有谁知道为什么 latin1_swedish 是 MySQL 的默认值。在我看来,UTF-8 会更兼容,对吗?

Defaults are usually chosen because they are the best universal choice, but in this case it does not seem thats what they did.

通常选择默认值是因为它们是最好的通用选择,但在这种情况下,它们似乎并非如此。

采纳答案by Pekka

As far as I can see, latin1 was the default character set in pre-multibyte times and it looks like that's been continued, probably for reasons of downward compatibility (e.g. for older CREATE statements that didn't specify a collation).

就我所见,latin1 是多字节时代之前的默认字符集,并且看起来一直在继续,可能是出于向下兼容性的原因(例如,对于未指定排序规则的旧 CREATE 语句)。

From here:

这里

What 4.0 Did

MySQL 4.0 (and earlier versions) only supported what amounted to a combined notion of the character set and collation with single-byte character encodings, which was specified at the server level. The default was latin1, which corresponds to a character set of latin1 and collation of latin1_swedish_ciin MySQL 4.1.

4.0 做了什么

MySQL 4.0(和更早版本)仅支持字符集和排序规则与单字节字符编码的组合概念,这是在服务器级别指定的。默认为latin1,它对应latin1_swedish_ci于 MySQL 4.1 中的 latin1 和排序规则的字符集。

As to why swedish, I can only guess that it's because MySQL AB is/was swedish. I can't see any other reason for choosing this collation, it comes with some specific sorting quirks (??ü come after Z I think) but they are nowhere near an international standard.

至于为什么是瑞典语,我只能猜测是因为 MySQL AB 是/曾经是瑞典语。我看不出选择这种排序规则的任何其他原因,它带有一些特定的排序怪癖(??ü 在 ZI 之后出现),但它们远不及国际标准。

回答by bear

latin1 is the default character set. MySQL's latin1 is the same as the Windows cp1252 character set. This means it is the same as the official ISO 8859-1 or IANA (Internet Assigned Numbers Authority) latin1, except that IANA latin1 treats the code points between 0x80 and 0x9f as “undefined,” whereas cp1252, and therefore MySQL's latin1, assign characters for those positions.

latin1 是默认字符集。MySQL 的 latin1 与 Windows cp1252 字符集相同。这意味着它与官方 ISO 8859-1 或 IANA(互联网编号分配机构)latin1 相同,除了 IANA latin1 将 0x80 和 0x9f 之间的代码点视为“未定义”,而 cp1252 以及 MySQL 的 latin1 分配字符对于那些职位。

from

http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html

http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html

Might help you understand why.

可能会帮助你理解为什么。

回答by AndreKR

Using a single-byte encoding has some advantages over multi-byte encondings, e.g. length of a string in bytes is equal to length of that string in characters. So if you use functions like SUBSTRING it is not intuitively clear if you mean characters or bytes. Also, for the same reasons, it requires quite a big change to the internal code to support multi-byte encodings.

使用单字节编码比多字节编码有一些优势,例如,以字节为单位的字符串长度等于以字符为单位的字符串长度。因此,如果您使用诸如 SUBSTRING 之类的函数,则直观上不清楚您是指字符还是字节。此外,出于同样的原因,它需要对内部代码进行相当大的更改以支持多字节编码。

回答by CodesInChaos

Most strange features of this kind are historic. They did it like that long time ago, and now they can't change it without breaking some app depending on that behavior.

大多数这种奇怪的特征都是历史性的。他们很久以前就这样做了,现在他们无法在不破坏某些应用程序的情况下更改它,具体取决于该行为。

Perhaps UTF8 wasn't popular then. Or perhaps MySQL didn't support charsets where multiple bytes encode on character then.

也许 UTF8 那时并不流行。或者也许 MySQL 不支持在字符上编码多个字节的字符集。