什么是 PostgreSQL 等价于 SQL Server NVARCHAR?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1245217/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What Is The PostgreSQL Equivalent To SQL Server NVARCHAR?
提问by kevinw
If I have fields of NVARCHAR (or NTEXT) data type in a Microsoft SQL Server database, what would be the equivalent data type in a PostgreSQL database?
如果 Microsoft SQL Server 数据库中有 NVARCHAR(或 NTEXT)数据类型的字段,那么 PostgreSQL 数据库中的等效数据类型是什么?
采纳答案by karim79
I'm pretty sure postgres varchar is the same as Oracle/Sybase/MSSQL nvarchar even though it is not explicit in the manual:
我很确定 postgres varchar 与 Oracle/Sybase/MSSQL nvarchar 相同,即使它在手册中没有明确说明:
http://www.postgresql.org/docs/7.4/static/datatype-character.html
http://www.postgresql.org/docs/7.4/static/datatype-character.html
Encoding conversion functions are here:
编码转换函数在这里:
http://www.postgresql.org/docs/current/static/functions-string.htmlhttp://www.postgresql.org/docs/current/static/functions-string.html#CONVERSION-NAMES
http://www.postgresql.org/docs/current/static/functions-string.html http://www.postgresql.org/docs/current/static/functions-string.html#CONVERSION-NAMES
Example:
例子:
create table
nvctest (
utf8fld varchar(12)
);
insert into nvctest
select convert('PostgreSQL' using ascii_to_utf_8);
select * from nvctest;
Also, there is this responseto a similar question from a Postgresql rep:
此外,还有对来自 Postgresql 代表的类似问题的回应:
All of our TEXT datatypes are multibyte-capable, provided you've installed PostgreSQL correctly.
This includes: TEXT (recommended) VARCHAR CHAR
我们所有的 TEXT 数据类型都支持多字节,前提是您已正确安装 PostgreSQL。
这包括: TEXT(推荐) VARCHAR CHAR
回答by Magnus Hagander
It's varcharand text, assuming your database is in UNICODE encoding. If your database is in a non-UNICODE encoding, there is no special datatype that will give you a unicode string - you can store it as a bytea stream, but that will not be a string.
它是varchar和text,假设您的数据库采用 UNICODE 编码。如果您的数据库采用非 UNICODE 编码,则没有特殊数据类型可以为您提供 unicode 字符串 - 您可以将其存储为字节流,但不会是字符串。
回答by user6649841
Short answer: There is no PostgreSQL equivalent to SQL Server NVARCHAR.
简短回答:没有与 SQL Server NVARCHAR 等效的 PostgreSQL。
The types of NVARCHAR(N) on different database are not equivalent. The standard allows for a wide choice of character collations and encodings/character sets. When dealing with unicode PostgreSQL and SQLServer fall into different camps and no equivalence exists.
不同数据库上的 NVARCHAR(N) 类型不等价。该标准允许广泛选择字符归类和编码/字符集。在处理 unicode 时,PostgreSQL 和 SQLServer 属于不同的阵营,不存在等价关系。
These differ w.r.t.
这些不同
- length semantics
- representable content
- sort order
- padding semantics
- 长度语义
- 可表示的内容
- 排序
- 填充语义
Thus moving data from one DB system (or encoding/character set) to another can lead to truncation/content loss.
因此,将数据从一个 DB 系统(或编码/字符集)移动到另一个可能会导致截断/内容丢失。
Specifically there is no equivalentbetween a PostgreSQL (9.1) character type and SQL Server NVARCHAR.
具体来说,PostgreSQL (9.1) 字符类型和 SQL Server NVARCHAR 之间没有等价物。
You may migrate the data to a PostgreSQL binary type, but would then loose text querying capabilities.
您可以将数据迁移到 PostgreSQL 二进制类型,但会失去文本查询功能。
(Unless PostgreSQL starts supporting a UTF-16 based unicode character set)
(除非 PostgreSQL 开始支持基于 UTF-16 的 unicode 字符集)
1) Length semantics
1) 长度语义
N is interpreted differently (Characters, Bytes, 2*N = Bytes) depending on database and encoding.
N 有不同的解释(字符、字节、2*N = 字节),具体取决于数据库和编码。
Microsoft SQL Server uses UCS2 encoding with the VARCHAR length interpreted as UCS-2 points, that is length*2 = bytes length ( https://docs.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2017):
their NVARCHAR(1) can store 1 UCS2 Characters (2 bytes of UCS2).
Oracle UTF-encoding has the same semantics ( and internal CESU-8 storage).
Microsoft SQL Server 使用 UCS2 编码,将 VARCHAR 长度解释为 UCS-2 点,即长度*2 = 字节长度(https://docs.microsoft.com/en-us/sql/t-sql/data-types/ nchar-and-nvarchar-transact-sql?view=sql-server-2017):
它们的 NVARCHAR(1) 可以存储 1 个 UCS2 字符(UCS2 的 2 个字节)。Oracle UTF 编码具有相同的语义(和内部 CESU-8 存储)。
Postgres 9.1 only has a Unicode UTF-8 character set (https://www.postgresql.org/docs/9.1/multibyte.html) , which, like Oracle (in AL32UTF8 or AL16UTF16 encoding) can store 1 full UCS32 codepoints. That is potentially up to 4 bytes (See e.g http://www.oracletutorial.com/oracle-basics/oracle-nvarchar2/which explicitly state the NVARCHAR2(50) column may take up to 200 bytes).
Postgres 9.1 只有一个 Unicode UTF-8 字符集 ( https://www.postgresql.org/docs/9.1/multibyte.html) ,它和 Oracle 一样(在 AL32UTF8 或 AL16UTF16 编码中)可以存储 1 个完整的 UCS32 代码点。这可能最多 4 个字节(参见例如 http://www.oracletutorial.com/oracle-basics/oracle-nvarchar2/,其中明确说明 NVARCHAR2(50) 列最多可能占用 200 个字节)。
The difference becomes significant when dealing with characters outside the basic multilingual plane which count as one "char unit" in utf8 ucs32 (go, char, char32_t, PostgreSQL ), but are represented as surrogate pairs in UTF-16 which count as two units ( Java, Javascript, C#, ABAP, wchar_t , SQLServer).
在处理基本多语言平面之外的字符时,差异变得显着,这些字符在 utf8 ucs32 (go, char, char32_t, PostgreSQL ) 中计为一个“字符单元”,但在 UTF-16 中表示为代理对,计为两个单元 ( Java、Javascript、C#、ABAP、wchar_t、SQLServer)。
e.g. U+1F60A SMILING FACE WITH SMILING EYES will use up all space in SQL Server NVARCHAR(2). But only one character unit in PostgreSQL.
例如,U+1F60A SMILING FACE WITH SMILING EYES 将用完 SQL Server NVARCHAR(2) 中的所有空间。但是PostgreSQL中只有一个字符单元。
Classical enterprise grade DBs will offer at least a choice with UTF-16 like semantics (SAP HANA (CESU-8), DB 2 with collation, SQL Anywhere (CESU8BIN), ...) E.g. Oracle also offers what they misleadingly call an UTF-8 Collation, which is effectivly CESU-8. This has the same length semantics, representable content as UTF-16 (=Microsoft SQL Server) and is a suitable collation used by UTF-16 based enterprise systems ( e.g. SAP R/3 ) or under a Java application server.
经典的企业级 DB 至少会提供类似 UTF-16 的语义选择(SAP HANA (CESU-8)、带排序规则的 DB 2、SQL Anywhere (CESU8BIN),...)例如 Oracle 还提供了他们误导性地称为 UTF -8 整理,实际上是 CESU-8。这具有与 UTF-16 (=Microsoft SQL Server) 相同的长度语义、可表示的内容,并且是基于 UTF-16 的企业系统(例如 SAP R/3)或在 Java 应用程序服务器下使用的合适的归类。
Note that some databases may still interpret NVARCHAR(N) as a length in byte limitation, even with a variable length unicode encoding ( Example SAP IQ ).
请注意,即使使用可变长度 unicode 编码(示例 SAP IQ),某些数据库仍可能将 NVARCHAR(N) 解释为字节长度限制。
2) Unrepresentable content
2) 不可代表的内容
UTF-16 / CESU-8 based system can represent half surrogate pairs, while UTF-8/UTF-32 based system can not. This content is unrepresentablein this character set, but are a frequent occurrence in UTF-16 based enterprise systems. (e.g. Windows pathnames may contain such non-utf-8 representable characters, see e.g. https://github.com/rust-lang/rust/issues/12056). Thus UTF-16 is a "superset" of UTF-8/UTF-16 which is typically a killer-criteria when dealing with data from enterprise/os-systems based on this encoding ( SAP, Windows, Java, JavaScript ). Note that Javascript JSON encoding took specific care to be able to represent these characters (https://tools.ietf.org/html/rfc8259#page-10).
基于 UTF-16/CESU-8 的系统可以表示半代理对,而基于 UTF-8/UTF-32 的系统则不能。此内容是不可表示的字符集中,但在基于UTF-16的企业系统经常发生。(例如,Windows 路径名可能包含此类非 utf-8 可表示的字符,参见例如https://github.com/rust-lang/rust/issues/12056)。因此,UTF-16 是 UTF-8/UTF-16 的“超集”,在基于这种编码(SAP、Windows、Java、JavaScript)处理来自企业/操作系统系统的数据时,它通常是一个杀手标准。请注意,Javascript JSON 编码特别注意能够表示这些字符 ( https://tools.ietf.org/html/rfc8259#page-10)。
(2) and (3) are more relevant when migration queries, but not for data migration.
(2) 和 (3) 在迁移查询时更相关,但与数据迁移无关。
3) Binary sort order:
3)二进制排序:
Note that binary sort order of CESU-8/UTF-16 is different than UTF-8/UTF-32.
请注意,CESU-8/UTF-16 的二进制排序顺序与 UTF-8/UTF-32 不同。
UTF-16/CESU-8/Java/JavaScript/ABAP sort order:
UTF-16/CESU-8/Java/JavaScript/ABAP 排序顺序:
U+0041 LATIN CAPITAL LETTER A
U+1F60A SMILING FACE WITH SMILING EYES
U+FB03 LATIN SMALL LIGATURE ffi
UTF-8 / UCS-32 (go) sort order:
UTF-8 / UCS-32 (go) 排序顺序:
U+0041 LATIN CAPITAL LETTER A
U+FB03 LATIN SMALL LIGATURE ffi
U+1F60A SMILING FACE WITH SMILING EYES
4) Padding semantics
4) 填充语义
Padding semantics differ on databases esp. when comparing VARCHAR with CHAR content.
填充语义在数据库上有所不同,尤其是。将 VARCHAR 与 CHAR 内容进行比较时。
回答by user6649841
Standard TEXT datatype is perfectly fine for it.
标准 TEXT 数据类型非常适合它。