什么是 PostgreSQL 等价于 SQL Server NVARCHAR?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1245217/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 22:19:03  来源:igfitidea点击:

What Is The PostgreSQL Equivalent To SQL Server NVARCHAR?

postgresql

提问by kevinw

If I have fields of NVARCHAR (or NTEXT) data type in a Microsoft SQL Server database, what would be the equivalent data type in a PostgreSQL database?

如果 Microsoft SQL Server 数据库中有 NVARCHAR(或 NTEXT)数据类型的字段,那么 PostgreSQL 数据库中的等效数据类型是什么?

采纳答案by karim79

I'm pretty sure postgres varchar is the same as Oracle/Sybase/MSSQL nvarchar even though it is not explicit in the manual:

我很确定 postgres varchar 与 Oracle/Sybase/MSSQL nvarchar 相同,即使它在手册中没有明确说明:

http://www.postgresql.org/docs/7.4/static/datatype-character.html

http://www.postgresql.org/docs/7.4/static/datatype-character.html

Encoding conversion functions are here:

编码转换函数在这里:

http://www.postgresql.org/docs/current/static/functions-string.htmlhttp://www.postgresql.org/docs/current/static/functions-string.html#CONVERSION-NAMES

http://www.postgresql.org/docs/current/static/functions-string.html http://www.postgresql.org/docs/current/static/functions-string.html#CONVERSION-NAMES

Example:

例子:

create table
nvctest (
utf8fld varchar(12)
);
insert into nvctest
select convert('PostgreSQL' using ascii_to_utf_8);
select * from nvctest;

Also, there is this responseto a similar question from a Postgresql rep:

此外,还有对来自 Postgresql 代表的类似问题的回应

All of our TEXT datatypes are multibyte-capable, provided you've installed PostgreSQL correctly.
This includes: TEXT (recommended) VARCHAR CHAR

我们所有的 TEXT 数据类型都支持多字节,前提是您已正确安装 PostgreSQL。
这包括: TEXT(推荐) VARCHAR CHAR

回答by Magnus Hagander

It's varcharand text, assuming your database is in UNICODE encoding. If your database is in a non-UNICODE encoding, there is no special datatype that will give you a unicode string - you can store it as a bytea stream, but that will not be a string.

它是varchartext,假设您的数据库采用 UNICODE 编码。如果您的数据库采用非 UNICODE 编码,则没有特殊数据类型可以为您提供 unicode 字符串 - 您可以将其存储为字节流,但不会是字符串。

回答by user6649841

Short answer: There is no PostgreSQL equivalent to SQL Server NVARCHAR.

简短回答:没有与 SQL Server NVARCHAR 等效的 PostgreSQL。

The types of NVARCHAR(N) on different database are not equivalent. The standard allows for a wide choice of character collations and encodings/character sets. When dealing with unicode PostgreSQL and SQLServer fall into different camps and no equivalence exists.

不同数据库上的 NVARCHAR(N) 类型不等价。该标准允许广泛选择字符归类和编码/字符集。在处理 unicode 时,PostgreSQL 和 SQLServer 属于不同的阵营,不存在等价关系。

These differ w.r.t.

这些不同

  1. length semantics
  2. representable content
  3. sort order
  4. padding semantics
  1. 长度语义
  2. 可表示的内容
  3. 排序
  4. 填充语义

Thus moving data from one DB system (or encoding/character set) to another can lead to truncation/content loss.

因此,将数据从一个 DB 系统(或编码/字符集)移动到另一个可能会导致截断/内容丢失

Specifically there is no equivalentbetween a PostgreSQL (9.1) character type and SQL Server NVARCHAR.

具体来说,PostgreSQL (9.1) 字符类型和 SQL Server NVARCHAR 之间没有等价物

You may migrate the data to a PostgreSQL binary type, but would then loose text querying capabilities.

您可以将数据迁移到 PostgreSQL 二进制类型,但会失去文本查询功能。

(Unless PostgreSQL starts supporting a UTF-16 based unicode character set)

(除非 PostgreSQL 开始支持基于 UTF-16 的 unicode 字符集)

1) Length semantics

1) 长度语义

N is interpreted differently (Characters, Bytes, 2*N = Bytes) depending on database and encoding.

N 有不同的解释(字符、字节、2*N = 字节),具体取决于数据库和编码。

Microsoft SQL Server uses UCS2 encoding with the VARCHAR length interpreted as UCS-2 points, that is length*2 = bytes length ( https://docs.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2017):
their NVARCHAR(1) can store 1 UCS2 Characters (2 bytes of UCS2). Oracle UTF-encoding has the same semantics ( and internal CESU-8 storage).

Microsoft SQL Server 使用 UCS2 编码,将 VARCHAR 长度解释为 UCS-2 点,即长度*2 = 字节长度(https://docs.microsoft.com/en-us/sql/t-sql/data-types/ nchar-and-nvarchar-transact-sql?view=sql-server-2017):
它们的 NVARCHAR(1) 可以存储 1 个 UCS2 字符(UCS2 的 2 个字节)。Oracle UTF 编码具有相同的语义(和内部 CESU-8 存储)。

Postgres 9.1 only has a Unicode UTF-8 character set (https://www.postgresql.org/docs/9.1/multibyte.html) , which, like Oracle (in AL32UTF8 or AL16UTF16 encoding) can store 1 full UCS32 codepoints. That is potentially up to 4 bytes (See e.g http://www.oracletutorial.com/oracle-basics/oracle-nvarchar2/which explicitly state the NVARCHAR2(50) column may take up to 200 bytes).

Postgres 9.1 只有一个 Unicode UTF-8 字符集 ( https://www.postgresql.org/docs/9.1/multibyte.html) ,它和 Oracle 一样(在 AL32UTF8 或 AL16UTF16 编码中)可以存储 1 个完整的 UCS32 代码点。这可能最多 4 个字节(参见例如 http://www.oracletutorial.com/oracle-basics/oracle-nvarchar2/,其中明确说明 NVARCHAR2(50) 列最多可能占用 200 个字节)。

The difference becomes significant when dealing with characters outside the basic multilingual plane which count as one "char unit" in utf8 ucs32 (go, char, char32_t, PostgreSQL ), but are represented as surrogate pairs in UTF-16 which count as two units ( Java, Javascript, C#, ABAP, wchar_t , SQLServer).

在处理基本多语言平面之外的字符时,差异变得显着,这些字符在 utf8 ucs32 (go, char, char32_t, PostgreSQL ) 中计为一个“字符单元”,但在 UTF-16 中表示为代理对,计为两个单元 ( Java、Javascript、C#、ABAP、wchar_t、SQLServer)。

e.g. U+1F60A SMILING FACE WITH SMILING EYES will use up all space in SQL Server NVARCHAR(2). But only one character unit in PostgreSQL.

例如,U+1F60A SMILING FACE WITH SMILING EYES 将用完 SQL Server NVARCHAR(2) 中的所有空间。但是PostgreSQL中只有一个字符单元。

Classical enterprise grade DBs will offer at least a choice with UTF-16 like semantics (SAP HANA (CESU-8), DB 2 with collation, SQL Anywhere (CESU8BIN), ...) E.g. Oracle also offers what they misleadingly call an UTF-8 Collation, which is effectivly CESU-8. This has the same length semantics, representable content as UTF-16 (=Microsoft SQL Server) and is a suitable collation used by UTF-16 based enterprise systems ( e.g. SAP R/3 ) or under a Java application server.

经典的企业级 DB 至少会提供类似 UTF-16 的语义选择(SAP HANA (CESU-8)、带排序规则的 DB 2、SQL Anywhere (CESU8BIN),...)例如 Oracle 还提供了他们误导性地称为 UTF -8 整理,实际上是 CESU-8。这具有与 UTF-16 (=Microsoft SQL Server) 相同的长度语义、可表示的内容,并且是基于 UTF-16 的企业系统(例如 SAP R/3)或在 Java 应用程序服务器下使用的合适的归类。

Note that some databases may still interpret NVARCHAR(N) as a length in byte limitation, even with a variable length unicode encoding ( Example SAP IQ ).

请注意,即使使用可变长度 unicode 编码(示例 SAP IQ),某些数据库仍可能将 NVARCHAR(N) 解释为字节长度限制。

2) Unrepresentable content

2) 不可代表的内容

UTF-16 / CESU-8 based system can represent half surrogate pairs, while UTF-8/UTF-32 based system can not. This content is unrepresentablein this character set, but are a frequent occurrence in UTF-16 based enterprise systems. (e.g. Windows pathnames may contain such non-utf-8 representable characters, see e.g. https://github.com/rust-lang/rust/issues/12056). Thus UTF-16 is a "superset" of UTF-8/UTF-16 which is typically a killer-criteria when dealing with data from enterprise/os-systems based on this encoding ( SAP, Windows, Java, JavaScript ). Note that Javascript JSON encoding took specific care to be able to represent these characters (https://tools.ietf.org/html/rfc8259#page-10).

基于 UTF-16/CESU-8 的系统可以表示半代理对,而基于 UTF-8/UTF-32 的系统则不能。此内容是不可表示的字符集中,但在基于UTF-16的企业系统经常发生。(例如,Windows 路径名可能包含此类非 utf-8 可表示的字符,参见例如https://github.com/rust-lang/rust/issues/12056)。因此,UTF-16 是 UTF-8/UTF-16 的“超集”,在基于这种编码(SAP、Windows、Java、JavaScript)处理来自企业/操作系统系统的数据时,它通常是一个杀手标准。请注意,Javascript JSON 编码特别注意能够表示这些字符 ( https://tools.ietf.org/html/rfc8259#page-10)。

(2) and (3) are more relevant when migration queries, but not for data migration.

(2) 和 (3) 在迁移查询时更相关,但与数据迁移无关。

3) Binary sort order:

3)二进制排序:

Note that binary sort order of CESU-8/UTF-16 is different than UTF-8/UTF-32.

请注意,CESU-8/UTF-16 的二进制排序顺序与 UTF-8/UTF-32 不同。

UTF-16/CESU-8/Java/JavaScript/ABAP sort order:

UTF-16/CESU-8/Java/JavaScript/ABAP 排序顺序:

U+0041  LATIN CAPITAL LETTER A
U+1F60A SMILING FACE WITH SMILING EYES
U+FB03  LATIN SMALL LIGATURE ffi 

UTF-8 / UCS-32 (go) sort order:

UTF-8 / UCS-32 (go) 排序顺序:

U+0041  LATIN CAPITAL LETTER A
U+FB03  LATIN SMALL LIGATURE ffi 
U+1F60A SMILING FACE WITH SMILING EYES

4) Padding semantics

4) 填充语义

Padding semantics differ on databases esp. when comparing VARCHAR with CHAR content.

填充语义在数据库上有所不同,尤其是。将 VARCHAR 与 CHAR 内容进行比较时。

回答by user6649841

Standard TEXT datatype is perfectly fine for it.

标准 TEXT 数据类型非常适合它。