string PostgreSQL:text 和 varchar 的区别(字符不同)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4848964/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:58:26  来源:igfitidea点击:

PostgreSQL: Difference between text and varchar (character varying)

stringpostgresqltexttypesvarchar

提问by Adam Matan

What's the difference between the textdata type and the character varying(varchar) data types?

text数据类型和character varying( varchar) 数据类型之间有什么区别?

According to the documentation

根据文档

If character varying is used without length specifier, the type accepts strings of any size. The latter is a PostgreSQL extension.

如果在没有长度说明符的情况下使用字符变化,则该类型接受任何大小的字符串。后者是 PostgreSQL 扩展。

and

In addition, PostgreSQL provides the text type, which stores strings of any length. Although the type text is not in the SQL standard, several other SQL database management systems have it as well.

另外,PostgreSQL 提供了 text 类型,可以存储任意长度的字符串。尽管类型文本不在 SQL 标准中,但其他几个 SQL 数据库管理系统也有。

So what's the difference?

那么有什么区别呢?

回答by Frank Heikens

There is no difference, under the hood it's all varlena(variable length array).

没有区别,在引擎盖下都是varlena可变长度数组)。

Check this article from Depesz: http://www.depesz.com/index.php/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/

从 Depesz 查看这篇文章:http: //www.depesz.com/index.php/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/

A couple of highlights:

几个亮点:

To sum it all up:

  • char(n) – takes too much space when dealing with values shorter than n(pads them to n), and can lead to subtle errors because of adding trailing spaces, plus it is problematic to change the limit
  • varchar(n) – it's problematic to change the limit in live environment (requires exclusive lock while altering table)
  • varchar – just like text
  • text – for me a winner– over (n) data types because it lacks their problems, and over varchar – because it has distinct name

总结一下:

  • char(n) – 在处理短于n(将它们填充为n)的值时占用太多空间,并且由于添加尾随空格可能导致细微错误,而且更改限制是有问题的
  • varchar(n) – 在实时环境中更改限制是有问题的(更改表时需要排他锁)
  • varchar——就像文本一样
  • text——对我来说是赢家——超过 (n) 个数据类型,因为它没有问题,超过 varchar——因为它有不同的名称

The article does detailed testing to show that the performance of inserts and selects for all 4 data types are similar. It also takes a detailed look at alternate ways on constraining the length when needed. Function based constraints or domains provide the advantage of instant increase of the length constraint, and on the basis that decreasing a string length constraint is rare, depesz concludes that one of them is usually the best choice for a length limit.

文章做了详细的测试,以表明所有 4 种数据类型的插入和选择的性能相似。它还详细介绍了在需要时限制长度的替代方法。基于函数的约束或域提供了立即增加长度约束的优势,并且基于减少字符串长度约束很少见,depesz 得出结论,其中之一通常是长度限制的最佳选择。

回答by George

As "Character Types" in the documentation points out, varchar(n), char(n), and textare all stored the same way. The only difference is extra cycles are needed to check the length, if one is given, and the extra space and time required if padding is needed for char(n).

正如文档中的“字符类型”指出的那样varchar(n)char(n)、 和text都以相同的方式存储。唯一的区别是需要额外的周期来检查长度(如果给出),如果需要填充,则需要额外的空间和时间char(n)

However, when you only need to store a single character, there is a slight performance advantage to using the special type "char"(keep the double-quotes — they're part of the type name). You get faster access to the field, and there is no overhead to store the length.

但是,当您只需要存储单个字符时,使用特殊类型有轻微的性能优势"char"(保留双引号——它们是类型名称的一部分)。您可以更快地访问该字段,并且没有存储长度的开销。

I just made a table of 1,000,000 random "char"chosen from the lower-case alphabet. A query to get a frequency distribution (select count(*), field ... group by field) takes about 650 milliseconds, vs about 760 on the same data using a textfield.

我刚刚制作了一张"char"从小写字母表中随机选择的 1,000,000 个表格。获取频率分布 ( select count(*), field ... group by field) 的查询需要大约 650 毫秒,而使用text字段的相同数据大约需要 760 毫秒。

回答by Peter Krauss

UPDATING BENCHMARKS FOR 2016 (pg9.5+)

更新 2016 年的基准(pg9.5+)

And using "pure SQL" benchmarks (without any external script)

并使用“纯 SQL”基准(没有任何外部脚本)

  1. use any string_generator with UTF8

  2. main benchmarks:

    2.1. INSERT

    2.2. SELECT comparing and counting

  1. 使用任何带有 UTF8 的 string_generator

  2. 主要基准:

    2.1. 插入

    2.2. SELECT 比较和计数



CREATE FUNCTION string_generator(int DEFAULT 20,int DEFAULT 10) RETURNS text AS $f$
  SELECT array_to_string( array_agg(
    substring(md5(random()::text),1,)||chr( 9824 + (random()*10)::int )
  ), ' ' ) as s
  FROM generate_series(1, ) i(x);
$f$ LANGUAGE SQL IMMUTABLE;

Prepare specific test (examples)

准备特定测试(示例)

DROP TABLE IF EXISTS test;
-- CREATE TABLE test ( f varchar(500));
-- CREATE TABLE test ( f text); 
CREATE TABLE test ( f text  CHECK(char_length(f)<=500) );

Perform a basic test:

执行基本测试:

INSERT INTO test  
   SELECT string_generator(20+(random()*(i%11))::int)
   FROM generate_series(1, 99000) t(i);

And other tests,

和其他测试,

CREATE INDEX q on test (f);

SELECT count(*) FROM (
  SELECT substring(f,1,1) || f FROM test WHERE f<'a0' ORDER BY 1 LIMIT 80000
) t;

... And use EXPLAIN ANALYZE.

... 并使用EXPLAIN ANALYZE.

UPDATED AGAIN 2018 (pg10)

2018 年再次更新(第 10 页)

little edit to add 2018's results and reinforce recommendations.

添加 2018 年的结果并加强建议的小编辑。



Results in 2016 and 2018

2016 年和 2018 年的结果

My results, after average, in many machines and many tests: all the same
(statistically less tham standard deviation).

我的结果,经过平均,在许多机器和许多测试中:都一样
(统计上小于标准偏差)。

Recommendation

推荐

  • Use textdatatype,
    avoid old varchar(x)because sometimes it is not a standard, e.g. in CREATE FUNCTIONclauses varchar(x)varchar(y).

  • express limits (with same varcharperformance!) by with CHECKclause in the CREATE TABLE
    e.g. CHECK(char_length(x)<=10).
    With a negligible loss of performance in INSERT/UPDATE you can also to control ranges and string structure
    e.g. CHECK(char_length(x)>5 AND char_length(x)<=20 AND x LIKE 'Hello%')

  • 使用text数据类型,
    避免使用oldvarchar(x)因为有时它不是标准,例如在 CREATE FUNCTION子句中 varchar(x)varchar(y)

  • varchar通过CHECKCREATE TABLE
    eg 中的with子句表达限制(具有相同的性能!)CHECK(char_length(x)<=10)
    由于 INSERT/UPDATE 中的性能损失可以忽略不计,您还可以控制范围和字符串结构,
    例如CHECK(char_length(x)>5 AND char_length(x)<=20 AND x LIKE 'Hello%')

回答by Peter Krauss

On PostgreSQL manual

关于 PostgreSQL 手册

There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.
除了使用空白填充类型时增加的存储空间以及存储到长度受限列时检查长度的一些额外 CPU 周期之外,这三种类型之间没有性能差异。虽然 character(n) 在其他一些数据库系统中具有性能优势,但在 PostgreSQL 中没有这样的优势;事实上 character(n) 通常是三个中最慢的,因为它有额外的存储成本。在大多数情况下,应改用文本或字符变化。

I usually use text

我通常使用文本

References: http://www.postgresql.org/docs/current/static/datatype-character.html

参考资料:http: //www.postgresql.org/docs/current/static/datatype-character.html

回答by sotn

In my opinion, varchar(n)has it's own advantages. Yes, they all use the same underlying type and all that. But, it should be pointed out that indexes in PostgreSQL has its size limit of 2712 bytesper row.

在我看来,varchar(n)有它自己的优势。是的,它们都使用相同的底层类型等等。但是,应该指出的是,PostgreSQL 中的索引的大小限制为每行2712 字节

TL;DR:If you use texttype without a constraintand have indexes on these columns, it is very possible that you hit this limit for some of your columns and get error when you try to insert data but with using varchar(n), you can prevent it.

TL;DR:如果您使用没有约束的text类型并且在这些列上有索引,则很可能您的某些列达到了此限制并在尝试插入数据时出错,但使用 using可以防止它。varchar(n)

Some more details:The problem here is that PostgreSQL doesn't give any exceptions when creating indexes for texttype or varchar(n)where nis greater than 2712. However, it will give error when a record with compressed size of greater than 2712 is tried to be inserted. It means that you can insert 100.000 character of string which is composed by repetitive characters easily because it will be compressed far below 2712 but you may not be able to insert some string with 4000 characters because the compressed size is greater than 2712 bytes. Using varchar(n)where nis not too muchgreater than 2712, you're safe from these errors.

更多细节:这里的问题是PostgreSQL在为texttype或varchar(n)wheren大于2712创建索引时不会给出任何异常。但是,当尝试插入压缩大小大于2712的记录时会出错。这意味着您可以轻松插入由重复字符组成的 100.000 个字符的字符串,因为它会被压缩远低于 2712,但您可能无法插入一些 4000 个字符的字符串,因为压缩后的大小大于 2712 个字节。使用varchar(n)wheren不会比 2712大太多,您就可以避免这些错误。

回答by bpd

text and varchar have different implicit type conversions. The biggest impact that I've noticed is handling of trailing spaces. For example ...

text 和 varchar 具有不同的隐式类型转换。我注意到的最大影响是处理尾随空格。例如 ...

select ' '::char = ' '::varchar, ' '::char = ' '::text, ' '::varchar = ' '::text

returns true, false, trueand not true, true, trueas you might expect.

返回,true, false, true而不是true, true, true您所期望的。

回答by Greg

Somewhat OT: If you're using Rails, the standard formatting of webpages may be different. For data entry forms textboxes are scrollable, but character varying(Rails string) boxes are one-line. Show views are as long as needed.

有点 OT:如果您使用 Rails,网页的标准格式可能会有所不同。对于数据输入表单,text框是可滚动的,但character varying(Rails string) 框是一行的。根据需要显示视图。

回答by Chris Halcrow

A good explanation from http://www.sqlines.com/postgresql/datatypes/text:

来自http://www.sqlines.com/postgresql/datatypes/text 的一个很好的解释:

The only difference between TEXT and VARCHAR(n) is that you can limit the maximum length of a VARCHAR column, for example, VARCHAR(255) does not allow inserting a string more than 255 characters long.

Both TEXT and VARCHAR have the upper limit at 1 Gb, and there is no performance difference among them (according to the PostgreSQL documentation).

TEXT 和 VARCHAR(n) 之间的唯一区别是您可以限制 VARCHAR 列的最大长度,例如,VARCHAR(255) 不允许插入长度超过 255 个字符的字符串。

TEXT 和 VARCHAR 的上限都是 1 Gb,它们之间没有性能差异(根据 PostgreSQL 文档)。

回答by ofir_aghai

character varying(n), varchar(n)- (Both the same). value will be truncated to n characters without raising an error.

character(n), char(n)- (Both the same). fixed-length and will pad with blanks till the end of the length.

text- Unlimited length.

character varying(n), varchar(n)- (两者相同)。value 将被截断为 n 个字符而不会引发错误。

character(n), char(n)- (两者相同)。固定长度,并将填充空白直到长度结束。

text- 无限长度。

Example:

例子:

Table test:
   a character(7)
   b varchar(7)

insert "ok    " to a
insert "ok    " to b

We get the results:

我们得到结果:

a        | (a)char_length | b     | (b)char_length
----------+----------------+-------+----------------
"ok     "| 7              | "ok"  | 2