为什么 MySQL 使用 latin1_swedish_ci 作为默认值？

Question

提问by Metropolis

Does anyone know why latin1_swedish is the default for MySQL. It would seem to me that UTF-8 would be more compatible right?

有谁知道为什么 latin1_swedish 是 MySQL 的默认值。在我看来，UTF-8 会更兼容，对吗？

Defaults are usually chosen because they are the best universal choice, but in this case it does not seem thats what they did.

通常选择默认值是因为它们是最好的通用选择，但在这种情况下，它们似乎并非如此。

Answer 1

采纳答案by Pekka

As far as I can see, latin1 was the default character set in pre-multibyte times and it looks like that's been continued, probably for reasons of downward compatibility (e.g. for older CREATE statements that didn't specify a collation).

就我所见，latin1 是多字节时代之前的默认字符集，并且看起来一直在继续，可能是出于向下兼容性的原因（例如，对于未指定排序规则的旧 CREATE 语句）。

From here:

从这里：

What 4.0 Did
MySQL 4.0 (and earlier versions) only supported what amounted to a combined notion of the character set and collation with single-byte character encodings, which was specified at the server level. The default was latin1, which corresponds to a character set of latin1 and collation of latin1_swedish_ciin MySQL 4.1.

4.0 做了什么
MySQL 4.0（和更早版本）仅支持字符集和排序规则与单字节字符编码的组合概念，这是在服务器级别指定的。默认为latin1，它对应latin1_swedish_ci于 MySQL 4.1 中的 latin1 和排序规则的字符集。

As to why swedish, I can only guess that it's because MySQL AB is/was swedish. I can't see any other reason for choosing this collation, it comes with some specific sorting quirks (??ü come after Z I think) but they are nowhere near an international standard.

至于为什么是瑞典语，我只能猜测是因为 MySQL AB 是/曾经是瑞典语。我看不出选择这种排序规则的任何其他原因，它带有一些特定的排序怪癖（??ü 在 ZI 之后出现），但它们远不及国际标准。

Answer 2

回答by bear

latin1 is the default character set. MySQL's latin1 is the same as the Windows cp1252 character set. This means it is the same as the official ISO 8859-1 or IANA (Internet Assigned Numbers Authority) latin1, except that IANA latin1 treats the code points between 0x80 and 0x9f as “undefined,” whereas cp1252, and therefore MySQL's latin1, assign characters for those positions.

latin1 是默认字符集。MySQL 的 latin1 与 Windows cp1252 字符集相同。这意味着它与官方 ISO 8859-1 或 IANA（互联网编号分配机构）latin1 相同，除了 IANA latin1 将 0x80 和 0x9f 之间的代码点视为“未定义”，而 cp1252 以及 MySQL 的 latin1 分配字符对于那些职位。

from

从

http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html

Might help you understand why.

可能会帮助你理解为什么。

Answer 3

回答by AndreKR

Using a single-byte encoding has some advantages over multi-byte encondings, e.g. length of a string in bytes is equal to length of that string in characters. So if you use functions like SUBSTRING it is not intuitively clear if you mean characters or bytes. Also, for the same reasons, it requires quite a big change to the internal code to support multi-byte encodings.

使用单字节编码比多字节编码有一些优势，例如，以字节为单位的字符串长度等于以字符为单位的字符串长度。因此，如果您使用诸如 SUBSTRING 之类的函数，则直观上不清楚您是指字符还是字节。此外，出于同样的原因，它需要对内部代码进行相当大的更改以支持多字节编码。

Answer 4

回答by CodesInChaos

Most strange features of this kind are historic. They did it like that long time ago, and now they can't change it without breaking some app depending on that behavior.

大多数这种奇怪的特征都是历史性的。他们很久以前就这样做了，现在他们无法在不破坏某些应用程序的情况下更改它，具体取决于该行为。

Perhaps UTF8 wasn't popular then. Or perhaps MySQL didn't support charsets where multiple bytes encode on character then.

也许 UTF8 那时并不流行。或者也许 MySQL 不支持在字符上编码多个字节的字符集。

为什么 MySQL 使用 latin1_swedish_ci 作为默认值？

提问by Metropolis

采纳答案by Pekka

回答by bear

回答by AndreKR

回答by CodesInChaos

相关推荐

最近更新

标签

为什么 MySQL 使用 latin1_swedish_ci 作为默认值？

提问by Metropolis

采纳答案by Pekka

回答by bear

回答by AndreKR

回答by CodesInChaos

相关推荐

MySQL 在查询中使用 except 时出错

将 MySQL 转换为 SQLite

MySQL 中的 `unsigned` 是什么意思以及何时使用它？

MySQL 如何抑制单个 SQL 语句的列标题输出？

相关推荐

最近更新

标签