postgresql 为什么要为不同类型的字符指定长度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7320316/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-20 23:16:54  来源:igfitidea点击:

Why specify a length for character varying types

databasedatabase-designpostgresqltypesvarchar

提问by Mr Shoubs

Referring to the Postgres Documentation on Character Types, I am unclear on the point of specifying a length for character varying (varchar) types.

参考 Postgres Documentation on Character Types,我不清楚为字符变化 (varchar) 类型指定长度。

Assumption:

假设:

  • the length of string doesn't matter to the application.
  • you don't care that someone puts that maximum size in the database
  • you have unlimited hard disk space
  • 字符串的长度与应用程序无关。
  • 你不在乎有人把最大的大小放在数据库中
  • 你有无限的硬盘空间

It does mention:

它确实提到:

The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of character. Longer strings have 4 bytes of overhead instead of 1. Long strings are compressed by the system automatically, so the physical requirement on disk might be less. Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that. It wouldn't be useful to change this because with multibyte character encodings the number of characters and bytes can be quite different.

短字符串(最多 126 个字节)的存储要求是 1 个字节加上实际字符串,其中包括字符情况下的空格填充。较长的字符串有 4 个字节的开销,而不是 1 个。系统会自动压缩长字符串,因此对磁盘的物理要求可能会更少。很长的值也存储在后台表中,这样它们就不会干扰对较短列值的快速访问。在任何情况下,可以存储的最长字符串大约是 1 GB。(数据类型声明中 n 允许的最大值小于该值。更改此值没有用,因为对于多字节字符编码,字符数和字节数可能完全不同。

This talks about the size of string, not the size of field, (i.e. sounds like it will always compress a large string in a large varchar field, but not a small string in a large varchar field?)

这谈论的是字符串的大小,而不是字段的大小,(即听起来它总是会压缩大 varchar 字段中的大字符串,但不会压缩大 varchar 字段中的小字符串?)

I ask this question as it would be much easier (and lazy) to specify a much larger size so you never have to worry about having a string too large. For example, if I specify varchar(50) for a place name I will get locations that have more characters (e.g. Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch), but if I specify varchar(100) or varchar(500), I'm less likley to get that problem.

我问这个问题是因为指定更大的尺寸会更容易(也很懒惰),这样你就不必担心字符串太大了。例如,如果我为地名指定 varchar(50),我将获得包含更多字符的位置(例如 Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch),但如果我指定 varchar(100) 或 varchar(500),我不太可能会遇到这个问题.

So would you get a performance hit between varchar(500) and (arbitrarily) varchar(5000000) or text() if your largest string was say 400 characters long?

那么,如果您的最大字符串长度为 400 个字符,那么您会在 varchar(500) 和(任意)varchar(5000000) 或 text() 之间获得性能下降吗?

Also out of interest if anyone has the answer to this AND knows the answer to this for other databases, please add that too.

如果有人对此有答案并且知道其他数据库的答案,也出于兴趣,也请添加。

I have googled, but not found a sufficiently technical explanation.

我用谷歌搜索过,但没有找到足够技术性的解释。

采纳答案by Marco Mariani

My understanding is that having constraints is useful for data integrity, therefore I use column sizes to both validate the data items at the lower layer, and to better describe the data model.

我的理解是约束对于数据完整性很有用,因此我使用列大小来验证较低层的数据项,并更好地描述数据模型。

Some links on the matter:

关于此事的一些链接:

回答by Sean McMains

My understanding is that this is a legacy of older databases with storage that wasn't as flexible as that of Postgres. Some would use fixed-length structures to make it easy to find particular records and, since SQL is a somewhat standardized language, that legacy is still seen even when it doesn't provide any practical benefit.

我的理解是,这是旧数据库的遗留问题,其存储不如 Postgres 灵活。有些人会使用固定长度的结构来轻松查找特定记录,并且由于 SQL 是一种标准化的语言,即使它没有提供任何实际好处,仍然可以看到这种遗留物。

Thus, your "make it big" approach should be an entirely reasonable one with Postgres, but it may not transfer well to other less flexible RDBMS systems.

因此,您的“做大”方法对于 Postgres 应该是一种完全合理的方法,但它可能无法很好地转移到其他不太灵活的 RDBMS 系统。

回答by Peter Eisentraut

The documentation explains this:

文档解释了这一点:

If character varying is used without length specifier, the type accepts strings of any size. The latter is a PostgreSQL extension.

如果在没有长度说明符的情况下使用字符变化,则该类型接受任何大小的字符串。后者是 PostgreSQL 扩展。

The SQL standard requires a length specification for all its types. This is probably mainly for legacy reasons. Among PostgreSQL users, the preference tends to be to omit the length specification, but if you want to write portable code, you have to include it (and pick an arbitrary size, in many cases).

SQL 标准要求所有类型的长度规范。这可能主要是出于遗留原因。在 PostgreSQL 用户中,倾向于省略长度规范,但是如果您想编写可移植的代码,则必须包含它(并且在许多情况下选择任意大小)。

回答by silvertc

Two more thoughts:

还有两个想法:

  1. The Postgres doc says that 'very long values are also stored in background tables'. Thus, defining all strings as unbounded likely pushes them into background tables -- for sure a performance hit.

  2. Declaring everything as very long interferes with the DB's efforts to predict a query execution plan, because it has less knowledge of the data.

  3. Building a b-tree to contain an index would also be thrown off because it would not be able to guess a reasonable packing strategy. For example if gender was TEXT, how would you know it's all only M or F?

  1. Postgres 文档说“非常长的值也存储在后台表中”。因此,将所有字符串定义为无界可能会将它们推入后台表 - 肯定会影响性能。

  2. 将所有内容声明为很长时间会干扰 DB 预测查询执行计划的工作,因为它对数据的了解较少。

  3. 构建包含索引的 b 树也会被抛弃,因为它无法猜测合理的打包策略。例如,如果性别是 TEXT,你怎么知道它只是 M 或 F?