SQL 国家字符 (NCHAR) 数据类型的真正用途是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3895387/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 07:51:53  来源:igfitidea点击:

What's the SQL national character (NCHAR) datatype really for?

sqlsql-serveroracletsqlnvarchar

提问by bobince

As well as CHAR(CHARACTER)and VARCHAR(CHARACTER VARYING), SQL offers an NCHAR(NATIONAL CHARACTER)and NVARCHAR(NATIONAL CHARACTER VARYING)type. In some databases, this is the better datatype to use for character (non-binary) strings:

除了CHAR(CHARACTER)and 之外VARCHAR(CHARACTER VARYING),SQL 还提供了NCHAR(NATIONAL CHARACTER)andNVARCHAR(NATIONAL CHARACTER VARYING)类型。在某些数据库中,这是用于字符(非二进制)字符串的更好的数据类型:

  • In SQL Server, NCHARis stored as UTF-16LE and is the only way to reliably store non-ASCII characters, CHARbeing a single-byte codepage only;

  • In Oracle, NVARCHARmay be stored as UTF-16 or UTF-8 rather than a single-byte collation;

  • But in MySQL, NVARCHARis VARCHAR, so it makes no difference, either type can be stored with UTF-8 or any other collation.

  • 在 SQL Server 中,NCHAR存储为 UTF-16LE 并且是可靠存储非 ASCII 字符的唯一方法,仅CHAR是单字节代码页;

  • 在 Oracle 中,NVARCHAR可能存储为 UTF-16 或 UTF-8 而不是单字节排序规则;

  • 但在 MySQL 中,NVARCHARis VARCHAR,所以它没有区别,任何一种类型都可以用 UTF-8 或任何其他排序规则存储。

So, what does NATIONALactually conceptually mean, if anything? The vendors' docs only tell you about what character sets their own DBMSs use, rather than the actual rationale. Meanwhile the SQL92 standard explains the feature even less helpfully, stating only that NATIONAL CHARACTERis stored in an implementation-defined character set. As opposed to a mere CHARACTER, which is stored in an implementation-defined character set. Which might be a different implementation-defined character set. Or not.

那么,NATIONAL如果有的话,实际上在概念上意味着什么?供应商的文档只告诉您他们自己的 DBMS 使用的字符集,而不是实际的基本原理。同时,SQL92 标准对这个特性的解释更没有帮助,只说明NATIONAL CHARACTER存储在实现定义的字符集中。与仅仅CHARACTER存储在实现定义的字符集中的 相对。这可能是不同的实现定义的字符集。或不。

Thanks, ANSI. Thansi.

谢谢,ANSI。坦西。

Should one use NVARCHARfor all character (non-binary) storage purposes? Are there currently-popular DBMSs in which it will do something undesirable, or which just don't recognise the keyword (or N''literals)?

是否应该NVARCHAR用于所有字符(非二进制)存储目的?是否有当前流行的 DBMS 会做一些不受欢迎的事情,或者只是不识别关键字(或N''文字)?

回答by Joel Coehoorn

"NATIONAL" in this case means characters specific to different nationalities. Far east languages especially have so many characters that one byte is not enough space to distinguish them all. So if you have an english(ascii)-only app or an english-only field, you can get away using the older CHAR and VARCHAR types, which only allow one byte per character.

在这种情况下,“NATIONAL”是指特定于不同国籍的字符。远东语言尤其是字符太多,一个字节不足以区分它们。因此,如果您有仅 english(ascii) 的应用程序english 的 field,则可以使用较旧的 CHAR 和 VARCHAR 类型,它们每个字符只允许一个字节。

That said, most of the time you should use NCHAR/NVARCHAR. Even if you don't think you need to support (or potentially support) multiple languages in your data, even english-only apps need to be able to sensibly handle security attacks using foreign-language characters.

也就是说,大多数情况下您应该使用 NCHAR/NVARCHAR。即使您认为不需要在数据中支持(或可能支持)多种语言,即使只有英语的应用程序也需要能够明智地处理使用外语字符的安全攻击。

In my opinion, about the only place where the older CHAR/VARCHAR types are still preferred is for frequently-referenced ascii-only internal codes and data on platforms like Sql Server that support the distinction — data that would be the equivalent of an enumin a client language like C++ or C#.

在我看来,在哪里旧的CHAR / VARCHAR类型仍首选的唯一地方是在如SQL Server平台上频繁引用的ASCII仅供内部代码和数据支持的区别-数据将是等效的enum在客户端语言,如 C++ 或 C#。

回答by dan04

Meanwhile the SQL92 standard explains the feature even less helpfully, stating only that NATIONAL CHARACTER is stored in an implementation-defined character set. As opposed to a mere CHARACTER, which is stored in an implementation-defined character set. Which might be a different implementation-defined character set. Or not.

与此同时,SQL92 标准对这个特性的解释更没有帮助,仅说明 NATIONAL CHARACTER 存储在实现定义的字符集中。与单纯的 CHARACTER 不同,CHARACTER 存储在实现定义的字符集中。这可能是不同的实现定义的字符集。或不。

Coincidentally, this is the same "distinction" the C++ standard makes between charand wchar_t. A relic of the Dark Ages of Character Encoding when every language/OS combination has its own character set.

巧合的是,这与 C++ 标准在char和之间做出的“区别”相同wchar_t。当每种语言/操作系统组合都有自己的字符集时,字符编码的黑暗时代的遗物。

Should one use NVARCHAR for all character (non-binary) storage purposes?

是否应该将 NVARCHAR 用于所有字符(非二进制)存储目的?

It is not important whether the declared type of your column is VARCHARor NVARCHAR. But it is important to use Unicode(whether UTF-8, UTF-16, or UTF-32) for all character storage purposes.

列的声明类型是VARCHAR或并不重要NVARCHAR。但是对于所有字符存储目的使用Unicode(无论是 UTF-8、UTF-16 还是 UTF-32)很重要。

Are there currently-popular DBMSs in which it will do something undesirable

是否有当前流行的 DBMS 会做一些不受欢迎的事情

Yes: In MS SQL Server, using NCHARmakes your (English) data take up twice as much space. Unfortunately, UTF-8 isn't supported yet.

是:在 MS SQL Server 中,使用NCHAR会使您的(英文)数据占用两倍的空间。不幸的是,尚不支持 UTF-8

回答by Gary Myers

In Oracle, the database character set can be a multi-byte character set, so you can store all manner of characters in there....but you need to understand and define the length of the columns appropriately (in either BYTES or CHARACTERS).

在 Oracle 中,数据库字符集可以是多字节字符集,因此您可以在其中存储各种字符....但是您需要适当地理解和定义列的长度(以 BYTES 或 CHARACTERS 为单位) .

NVARCHAR gives you the option to have a database character set that is a single-byte (which reduces the potential for confusion between BYTE or CHARACTER sized columns) and use NVARCHAR as the multi-byte. See here.

NVARCHAR 使您可以选择拥有一个单字节的数据库字符集(这减少了 BYTE 或 CHARACTER 大小的列之间混淆的可能性)并使用 NVARCHAR 作为多字节。见这里

Since I predominantly work with English data, I'd go with a multi-byte character set (UTF-8 mostly) as the database character set and ignore NVARCHAR. If I inherited an old database which was in a single-byte characterset and was too big to convert, I may use NVARCHAR. But I'd prefer not to.

由于我主要使用英文数据,因此我会使用多字节字符集(主要是 UTF-8)作为数据库字符集,而忽略 NVARCHAR。如果我继承了一个单字节字符集并且太大而无法转换的旧数据库,我可能会使用 NVARCHAR。但我宁愿不这样做。