Postgresql varchar 是否使用 unicode 字符长度或 ASCII 字符长度计数？

Question

提问by Ben Lopatin

I tried importing a database dump from a SQL file and the insert failed when inserting the string Mérinto a field defined as varying(3). I didn't capture the exact error, but it pointed to that specific value with the constraint of varying(3).

我尝试从 SQL 文件导入数据库转储，但在将字符串Mér插入定义为varying(3). 我没有捕捉到确切的错误，但它指向了具有varying(3).

Given that I considered this unimportant to what I was doing at the time, I just changed the value to Mer, it worked, and I moved on.

鉴于我认为这对我当时正在做的事情不重要，我只是将值更改为Mer，它起作用了，然后我继续前进。

Is a varyingfield with its limit taking into account length of the byte string? What really boggles my mind is that this was dumped from another PostgreSQL database. So it doesn't make sense how a constraint could allow the value to be written initially.

是否varying考虑了字节字符串的长度限制的字段？真正让我难以置信的是，这是从另一个 PostgreSQL 数据库中转储的。因此，约束如何允许最初写入值是没有意义的。

Answer 1

回答by araqnid

The length limit imposed by varchar(N)types and calculated by the lengthfunction is in characters, not bytes. So 'abcdef'::char(3)is truncated to 'abc'but 'acdef'::char(3)is truncated to 'ac', even in the context of a database encoded as UTF-8, where 'ac'is encoded using 5 bytes.

由varchar(N)类型强加并由length函数计算的长度限制以字符为单位，而不是字节。So'abcdef'::char(3)被截断为'abc'但'acdef'::char(3)被截断为'ac'，即使在编码为 UTF-8 的数据库的上下文中，其中'ac'使用 5 个字节进行编码。

If restoring a dump file complained that 'Mér'would not go into a varchar(3)column, that suggests you were restoring a UTF-8 encoded dump file into a SQL_ASCII database.

如果还原转储文件抱怨'Mér'不会进入varchar(3)列，则表明您正在将 UTF-8 编码的转储文件还原到 SQL_ASCII 数据库中。

For example, I did this in a UTF-8 database:

例如，我在 UTF-8 数据库中执行此操作：

create schema so4249745;
create table so4249745.t(key varchar(3) primary key);
insert into so4249745.t values('Mér');

And then dumped this and tried to load it into a SQL_ASCII database:

然后转储它并尝试将其加载到 SQL_ASCII 数据库中：

pg_dump -f dump.sql --schema=so4249745 --table=t
createdb -E SQL_ASCII -T template0 enctest
psql -f dump.sql enctest

And sure enough:

果然：

psql:dump.sql:34: ERROR:  value too long for type character varying(3)
CONTEXT:  COPY t, line 1, column key: "Mér"

By contrast, if I create the database enctest as encoding LATIN1 or UTF8, it loads fine.

相比之下，如果我将数据库 enctest 创建为编码 LATIN1 或 UTF8，它加载得很好。

This problem comes about because of a combination of dumping a database with a multi-byte character encoding, and trying to restore it into a SQL_ASCII database. Using SQL_ASCII basically disables the transcoding of client data to server data and assumes one byte per character, leaving it to the clients to take responsibility for using the right character map. Since the dump file contains the stored string as UTF-8, that is four bytes, so a SQL_ASCII database sees that as four characters, and therefore regards it as violating the constraint. And it prints out the value, which my terminal then reassembles as three characters.

出现此问题是因为转储具有多字节字符编码的数据库并尝试将其恢复到 SQL_ASCII 数据库的组合。使用 SQL_ASCII 基本上禁用了客户端数据到服务器数据的转码，并假设每个字符一个字节，让客户端负责使用正确的字符映射。由于转储文件包含存储的 UTF-8 字符串，即四个字节，因此 SQL_ASCII 数据库将其视为四个字符，因此将其视为违反约束。它打印出值，然后我的终端将其重新组合为三个字符。

Answer 2

回答by vasquez

It depends what value you used when you created the database. createdb -E UNICODEcreates a Unicode DB that should also accept multibyte characters and count them as one character.

这取决于您在创建数据库时使用的值。createdb -E UNICODE创建一个 Unicode DB，它也应该接受多字节字符并将它们算作一个字符。

You can use

您可以使用

psql -l

to see which encoding was used. This pagehas a table including information about how many bytes per character are used.

查看使用了哪种编码。该页面有一个表格，其中包含有关每个字符使用多少字节的信息。

Postgresql varchar 是否使用 unicode 字符长度或 ASCII 字符长度计数？

提问by Ben Lopatin

回答by araqnid

回答by vasquez

相关推荐

最近更新

标签

Postgresql varchar 是否使用 unicode 字符长度或 ASCII 字符长度计数？

提问by Ben Lopatin

回答by araqnid

回答by vasquez

相关推荐

PostgreSQL 使用 LIKE/ILIKE 加入

postgresql 删除重复行（不要删除所有重复行）

postgresql 嵌套事务 - 回滚场景

PostgreSQL 中的文本压缩

相关推荐

最近更新

标签