MySQL SQL varchar 列长度的最佳实践

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8295131/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 21:50:41  来源:igfitidea点击:

Best practices for SQL varchar column length

mysqlsqlsql-serverpostgresql

提问by esskar

Every time is set up a new SQL table or add a new varcharcolumn to an existing table, I am wondering one thing: what is the best value for the length.

每次设置新的 SQL 表或varchar向现有表添加新列时,我都在想一件事:length.

So, lets say, you have a column called nameof type varchar. So, you have to choose the length. I cannot think of a name > 20 chars, but you will never know. But instead of using 20, I always round up to the next 2^n number. In this case, I would choose 32 as the length. I do that, because from an computer scientist point of view, a number 2^n looks more evento me than other numbers and I'm just assuming that the architecture underneath can handle those numbers slightly better than others.

因此,假设您有一个名为nametype的列varchar。所以,你必须选择长度。我想不出超过 20 个字符的名字,但你永远不会知道。但是我没有使用 20,而是总是四舍五入到下一个 2^n 数字。在这种情况下,我会选择 32 作为长度。我这样做是因为从计算机科学家的角度来看,数字 2^neven在我看来比其他数字更重要,我只是假设下面的架构可以比其他数字更好地处理这些数字。

On the other hand, MSSQL server for example, sets the default length value to 50, when you choose to create a varchar column. That makes me thinking about it. Why 50? is it just a random number, or based on average column length, or what?

另一方面,例如,当您选择创建 varchar 列时,MSSQL 服务器将默认长度值设置为 50。这让我思考。为什么是50?它只是一个随机数,还是基于平均列长度,还是什么?

It could also be - or probably is - that different SQL servers implementations (like MySQL, MSSQL, Postgres, ...) have different best column length values.

也可能 - 或者可能是 - 不同的 SQL 服务器实现(如 MySQL、MSSQL、Postgres 等)具有不同的最佳列长度值。

采纳答案by a_horse_with_no_name

No DBMS I know of has any "optimization" that will make a VARCHARwith a 2^nlength perform better than one with a maxlength that is not a power of 2.

没有DBMS我知道有任何“优化”,这将使VARCHAR2^n带长度执行比一个好max长度不是2的幂。

I think early SQL Server versions actually treated a VARCHARwith length 255 differently than one with a higher maximum length. I don't know if this is still the case.

我认为早期的 SQL Server 版本实际上对待VARCHAR长度为 255 的 与具有更高最大长度的版本不同。我不知道现在是否仍然如此。

For almost all DBMS, the actual storage that is required is only determined by the number of characters you put into it, not the maxlength you define. So from a storage point of view (and most probably a performance one as well), it does not make any difference whether you declare a column as VARCHAR(100)or VARCHAR(500).

对于几乎所有 DBMS,所需的实际存储量仅取决于您放入其中的字符数,而不是max您定义的长度。因此,从存储的角度(也很可能是性能的角度),将列声明为VARCHAR(100)或没有任何区别VARCHAR(500)

You should see the maxlength provided for a VARCHARcolumn as a kind of constraint (or business rule) rather than a technical/physical thing.

您应该max将为VARCHAR列提供的长度视为一种约束(或业务规则),而不是技术/物理事物。

For PostgreSQL the best setup is to use textwithout a length restriction and a CHECK CONSTRAINTthat limits the number of characters to whatever your business requires.

对于 PostgreSQL,最好的设置是在text没有长度限制的情况下使用,并且将CHECK CONSTRAINT字符数限制为您的业务需要的任何内容。

If that requirement changes, altering the check constraint is much faster than altering the table (because the table does not need to be re-written)

如果该需求发生变化,更改检查约束比更改表要快得多(因为不需要重写表)

The same can be applied for Oracle and others - in Oracle it would be VARCHAR(4000)instead of textthough.

这同样适用于 Oracle 和其他 - 在 Oracle 中它会VARCHAR(4000)代替text

I don't know if there is a physical storage difference between VARCHAR(max)and e.g. VARCHAR(500)in SQL Server. But apparently there is a performance impact when using varchar(max)as compared to varchar(8000).

我不知道在 SQL Server 中VARCHAR(max)和之间是否存在物理存储差异VARCHAR(500)。但varchar(max)varchar(8000).

See this link(posted by Erwin Brandstetter as a comment)

请参阅此链接(由 Erwin Brandstetter 作为评论发布)

Edit 2013-09-22

编辑 2013-09-22

Regarding bigown's comment:

关于 bigown 的评论:

In Postgres versions before 9.2 (which was not available when I wrote the initial answer) a change to the column definition didrewrite the whole table, see e.g. here. Since 9.2 this is no longer the case and a quick test confirmed that increasing the column size for a table with 1.2 million rows indeed only took 0.5 seconds.

在 9.2 之前的 Postgres 版本(在我编写初始答案时不可用)中,对列定义的更改确实重写了整个表,例如参见此处。从 9.2 开始,情况不再如此,快速测试证实,为具有 120 万行的表增加列大小确实只需要 0.5 秒。

For Oracle this seems to be true as well, judging by the time it takes to alter a big table's varcharcolumn. But I could not find any reference for that.

对于 Oracle 来说,这似乎也是正确的,从更改大表的varchar列所需的时间来判断。但我找不到任何参考。

For MySQL the manual says"In most cases, ALTER TABLEmakes a temporary copy of the original table". And my own tests confirm that: running an ALTER TABLEon a table with 1.2 million rows (the same as in my test with Postgres) to increase the size of a column took 1.5 minutes. In MySQL however you can notuse the "workaround" to use a check constraint to limit the number of characters in a column.

对于 MySQL ,手册说在大多数情况下,ALTER TABLE制作原始表的临时副本”。我自己的测试证实:ALTER TABLE在一个有 120 万行的表上运行一个(与我使用 Postgres 的测试相同)以增加列的大小需要 1.5 分钟。但是,在 MySQL 中,您不能使用“解决方法”来使用检查约束来限制列中的字符数。

For SQL Server I could not find a clear statement on this but the execution time to increase the size of a varcharcolumn (again the 1.2 million rows table from above) indicates that norewrite takes place.

对于 SQL Server,我找不到对此的明确声明,但增加varchar列大小的执行时间(同样是上面的 120 万行表)表明没有发生重写。

Edit 2017-01-24

编辑 2017-01-24

Seems I was (at least partially) wrong about SQL Server. See this answer from Aaron Bertrandthat shows that the declared length of a nvarcharor varcharcolumns makes a huge difference for the performance.

似乎我对 SQL Server 的看法(至少部分)是错误的。请参阅Aaron Bertrand 的这个答案,该答案表明 anvarcharvarchar列的声明长度对性能产生巨大影响。

回答by Ariel

VARCHAR(255)and VARCHAR(2)take exactlythe same amount of space on disk! So the only reason to limit it is if you have a specific need for it to be smaller. Otherwise make them all 255.

VARCHAR(255)并在磁盘上VARCHAR(2)占用完全相同的空间!所以限制它的唯一原因是如果你有特定的需要让它更小。否则,将它们全部设为 255。

Specifically, when doing sorting, larger column do take up more space, so if that hurts performance, then you need to worry about it and make them smaller. But if you only ever select 1 row from that table, then you can just make them all 255 and it won't matter.

具体来说,在进行排序时,较大的列确实会占用更多空间,因此如果这会损害性能,那么您需要担心并缩小它们。但是,如果您只从该表中选择 1 行,那么您可以将它们全部设为 255,这无关紧要。

See: What are the optimum varchar sizes for MySQL?

请参阅:MySQL 的最佳 varchar 大小是多少?

回答by Kit

Whenever I set up a new SQL table I feel the same way about 2^n being more "even"... but to sum up the answers here, there is no significant impact on storage space simply by defining varchar(2^n) or even varchar(MAX).

每当我设置一个新的 SQL 表时,我都会对 2^n 更“偶数”有同样的感觉......但总结一下这里的答案,仅仅通过定义 varchar(2^n) 对存储空间没有显着影响甚至 varchar(MAX)。

That said, you should still anticipate the potential implications on storage and performance when setting a high varchar() limit. For example, let's say you create a varchar(MAX) column to hold product descriptions with full-text indexing. If 99% of descriptions are only 500 characters long, and then suddenly you get somebody who replaces said descriptions with wikipedia articles, you may notice unanticipated significant storage and performance hits.

也就是说,在设置高 varchar() 限制时,您仍然应该预测对存储和性能的潜在影响。例如,假设您创建了一个 varchar(MAX) 列来保存具有全文索引的产品描述。如果 99% 的描述只有 500 个字符长,然后突然有人用维基百科文章替换所述描述,您可能会注意到意外的显着存储和性能损失。

Another thing to consider from Bill Karwin:

Bill Karwin 需要考虑的另一件事

There's one possible performance impact: in MySQL, temporary tables and MEMORY tables store a VARCHAR column as a fixed-length column, padded out to its maximum length. If you design VARCHAR columns much larger than the greatest size you need, you will consume more memory than you have to. This affects cache efficiency, sorting speed, etc.

有一种可能的性能影响:在 MySQL 中,临时表和 MEMORY 表将 VARCHAR 列存储为固定长度的列,填充到其最大长度。如果您设计的 VARCHAR 列比您需要的最大大小大得多,您将消耗更多的内存。这会影响缓存效率、排序速度等。

Basically, just come up with reasonable business constraints and error on a slightly larger size. As @onedaywhen pointed out, family names in UK are usually between 1-35 characters. If you decide to make it varchar(64), you're not really going to hurt anything... unless you're storing this guy's family namethat's said to be up to 666 characters long. In that case, maybe varchar(1028) makes more sense.

基本上,只需在稍大的规模上提出合理的业务限制和错误即可。正如@onedaywhen 指出的那样,英国的姓氏通常在 1-35 个字符之间。如果您决定将其设为 varchar(64),那么您不会真正受到任何伤害...除非您要存储据说长达 666 个字符的这个人的姓氏。在这种情况下,也许 varchar(1028) 更有意义。

And in case it's helpful, here's what varchar 2^5 through 2^10 might look like if filled:

如果它有帮助,这里是 varchar 2^5 到 2^10 填充后的样子:

varchar(32)     Lorem ipsum dolor sit amet amet.

varchar(64)     Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donecie

varchar(128)    Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donecie
                vestibulum massa. Nullam dignissim elementum molestie. Vehiculas

varchar(256)    Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donecie
                vestibulum massa. Nullam dignissim elementum molestie. Vehiculas
                velit metus, sit amet tristique purus condimentum eleifend. Quis
                que mollis magna vel massa malesuada bibendum. Proinde tincidunt

varchar(512)    Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donecie
                vestibulum massa. Nullam dignissim elementum molestie. Vehiculas
                velit metus, sit amet tristique purus condimentum eleifend. Quis
                que mollis magna vel massa malesuada bibendum. Proinde tincidunt
                dolor tellus, sit amet porta neque varius vitae. Seduse molestie
                lacus id lacinia tempus. Vestibulum accumsan facilisis lorem, et
                mollis diam pretium gravida. In facilisis vitae tortor id vulput
                ate. Proin ornare arcu in sollicitudin pharetra. Crasti molestie

varchar(1024)   Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donecie
                vestibulum massa. Nullam dignissim elementum molestie. Vehiculas
                velit metus, sit amet tristique purus condimentum eleifend. Quis
                que mollis magna vel massa malesuada bibendum. Proinde tincidunt
                dolor tellus, sit amet porta neque varius vitae. Seduse molestie
                lacus id lacinia tempus. Vestibulum accumsan facilisis lorem, et
                mollis diam pretium gravida. In facilisis vitae tortor id vulput
                ate. Proin ornare arcu in sollicitudin pharetra. Crasti molestie
                dapibus leo lobortis eleifend. Vivamus vitae diam turpis. Vivamu
                nec tristique magna, vel tincidunt diam. Maecenas elementum semi
                quam. In ut est porttitor, sagittis nulla id, fermentum turpist.
                Curabitur pretium nibh a imperdiet cursus. Sed at vulputate este
                proin fermentum pretium justo, ac malesuada eros et Pellentesque
                vulputate hendrerit molestie. Aenean imperdiet a enim at finibus
                fusce ut ullamcorper risus, a cursus massa. Nunc non dapibus vel
                Lorem ipsum dolor sit amet, consectetur Praesent ut ultrices sit

回答by Oded

The best value is the one that is right for the data as defined in the underlying domain.

最佳值是适用于基础域中定义的数据的值。

For some domains, VARCHAR(10)is right for the Nameattribute, for other domains VARCHAR(255)might be the best choice.

对于某些域,VARCHAR(10)适合Name属性,对于其他域VARCHAR(255)可能是最佳选择。

回答by Jon Black

Adding to a_horse_with_no_name's answer you might find the following of interest...

添加到 a_horse_with_no_name 的答案中,您可能会发现以下有趣的内容...

it does not make any difference whether you declare a column as VARCHAR(100) or VACHAR(500).

将列声明为 VARCHAR(100) 还是 VACHAR(500) 没有任何区别。

-- try to create a table with max varchar length
drop table if exists foo;
create table foo(name varchar(65535) not null)engine=innodb;

MySQL Database Error: Row size too large.

-- try to create a table with max varchar length - 2 bytes for the length
drop table if exists foo;
create table foo(name varchar(65533) not null)engine=innodb;

Executed Successfully

-- try to create a table with max varchar length with nullable field
drop table if exists foo;
create table foo(name varchar(65533))engine=innodb;

MySQL Database Error: Row size too large.

-- try to create a table with max varchar length with nullable field
drop table if exists foo;
create table foo(name varchar(65532))engine=innodb;

Executed Successfully

Dont forget the length byte(s) and the nullable byte so:

不要忘记长度字节和可空字节,所以:

name varchar(100) not nullwill be 1 byte (length) + up to 100 chars (latin1)

name varchar(100) not null将是 1 个字节(长度)+ 最多 100 个字符(latin1)

name varchar(500) not nullwill be 2 bytes (length) + up to 500 chars (latin1)

name varchar(500) not null将是 2 个字节(长度)+ 最多 500 个字符(latin1)

name varchar(65533) not nullwill be 2 bytes (length) + up to 65533 chars (latin1)

name varchar(65533) not null将是 2 个字节(长度)+ 最多 65533 个字符(latin1)

name varchar(65532)will be 2 bytes (length) + up to 65532 chars (latin1) + 1 null byte

name varchar(65532)将是 2 个字节(长度)+ 最多 65532 个字符(latin1)+ 1 个空字节

Hope this helps :)

希望这可以帮助 :)

回答by onedaywhen

Always check with your business domain expert. If that's you, look for an industry standard. If, for example, the domain in question is a natural person's family name (surname) then for a UK business I'd go to the UK Govtalk data standards catalogue for person informationand discover that a family name will be between 1 and 35 characters.

始终与您的业务领域专家核对。如果这就是您,请寻找行业标准。例如,如果有问题的域是自然人的姓氏(姓氏),那么对于英国企业,我会访问英国政府数据标准目录以获取个人信息并发现姓氏将在 1 到 35 个字符之间.

回答by user1041892

I haven't checked this lately, but I know in the past with Oracle that the JDBC driver would reserve a chunk of memory during query execution to hold the result set coming back. The size of the memory chunk is dependent on the column definitions and the fetch size. So the length of the varchar2 columns affects how much memory is reserved. This caused serious performance issues for me years ago as we always used varchar2(4000) (the max at the time) and garbage collection was much less efficient than it is today.

我最近没有检查过这个,但我在过去知道 Oracle JDBC 驱动程序会在查询执行期间保留一块内存以保存返回的结果集。内存块的大小取决于列定义和提取大小。因此 varchar2 列的长度会影响保留的内存量。几年前,这对我造成了严重的性能问题,因为我们总是使用 varchar2(4000)(当时的最大值)并且垃圾收集的效率远低于今天。

回答by Dale Willis

In a sense you're right, although anything lower than 2^8 characters will still register as a byte of data.

从某种意义上说,您是对的,尽管低于 2^8 个字符的任何内容仍将注册为一个字节的数据。

If you account for the base character that leaves anything with a VARCHAR < 255 as consuming the same amount of space.

如果您将留下任何 VARCHAR < 255 的基本字符视为消耗相同数量的空间。

255 is a good baseline definition unless you particularly wish to curtail excessive input.

255 是一个很好的基线定义,除非您特别希望减少过多的输入。