MySQL 表中 varchar 长度的重要性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1962310/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Importance of varchar length in MySQL table
提问by Brian
I have a MySQL table where rows are inserted dynamically. Because I can not be certain of the length of strings and do not want them cut off, I make them varchar(200) which is generally much bigger than I need. Is there a big performance hit in giving a varchar field much more length than necessary?
我有一个 MySQL 表,其中的行是动态插入的。因为我不能确定字符串的长度并且不希望它们被切断,所以我使它们 varchar(200) 通常比我需要的大得多。为 varchar 字段提供比必要更长的长度是否会影响性能?
采纳答案by Alex Martelli
No, in the sense that if the values you're storing in that column are always (say) less than 50 characters, declaring the column as varchar(50)
or varchar(200)
has the same performance.
不,从某种意义上说,如果您存储在该列中的值始终(例如)少于 50 个字符,则将该列声明为varchar(50)
或varchar(200)
具有相同的性能。
回答by Bill Karwin
There's one possible performance impact: in MySQL, temporary tables and MEMORY
tables store a VARCHAR
column as a fixed-length column, padded out to its maximum length. If you design VARCHAR
columns much larger than the greatest size you need, you will consume more memory than you have to. This affects cache efficiency, sorting speed, etc.
有一个可能的性能影响:在 MySQL 中,临时表和MEMORY
表将VARCHAR
列存储为固定长度的列,填充到其最大长度。如果您设计的VARCHAR
列比您需要的最大尺寸大得多,您将消耗更多的内存。这会影响缓存效率、排序速度等。
回答by OMG Ponies
VARCHAR is ideal for the situation you describe, because it stands for "variable character" - the limit, based on your example, would be 200 characters but anything less is accepted andwon't fill the allotted size of the column.
VARCHAR 非常适合您描述的情况,因为它代表“可变字符”——根据您的示例,限制为 200 个字符,但任何更少的字符都被接受并且不会填充分配的列大小。
VARCHAR also take less space - the values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.
VARCHAR 还占用更少的空间 - 值存储为一字节或两字节长度的前缀加上数据。长度前缀表示值中的字节数。如果值需要不超过 255 个字节,则列使用一个长度字节,如果值可能需要超过 255 个字节,则使用两个长度字节。
For more information comparing the MySQL CHAR to VARCHAR datatypes, see this link.
有关比较 MySQL CHAR 与 VARCHAR 数据类型的更多信息,请参阅此链接。
回答by Nudge
Size is performance! The smaller the size, the better. Not today or tomorrow, but some day your tables will be grown to a size when it comes to serious bottlenecks, no matter what design you laid out. But you can foresee some of those potential bottlenecks in your design phase that are likely to happen first and try to expand the time your db will perform fast and happily until you need to rethink your scheme or scale horizontally by adding more servers.
尺寸就是性能!尺寸越小越好。不是今天或明天,但总有一天,当遇到严重瓶颈时,无论您采用何种设计,您的桌子都会变大。但是您可以预见在您的设计阶段可能首先发生的一些潜在瓶颈,并尝试延长您的数据库快速、愉快地执行的时间,直到您需要重新考虑您的方案或通过添加更多服务器来横向扩展。
In your case there are many performance leaks you can run into: Big joins are nearly impossible with long varchar
columns. Indexing on those columns are a real killer. Your disk has to store the data. One memory page can hold less rows and table scans will be much slower. Also the query cache will be unlikely to help you here.
在您的情况下,您可能会遇到许多性能泄漏:使用长varchar
列几乎不可能进行大连接。在这些列上建立索引是一个真正的杀手。您的磁盘必须存储数据。一个内存页可以容纳更少的行,表扫描会慢得多。此外,查询缓存不太可能在这里帮助您。
You have to ask yourself: How many inserts per year may happen? What is the average length? Do I really need more than 200 characters or can I catch that in my application front-end, even by informing users about the maximum length? Can I split up the table into a narrow one for fast indexing and scanning and another one for holding additional, less frequently needed data of expanding size? Can I type the possible varchar data into categories and so extract some of the data into a few smaller, maybe int or bool-type columns and narrow the varchar column that way?
您必须问自己:每年可能会发生多少次插入?平均长度是多少?我真的需要超过 200 个字符,还是可以在我的应用程序前端捕获它,即使通知用户最大长度?我可以将表拆分为一个狭窄的表格以进行快速索引和扫描,另一个表格用于保存额外的、不太经常需要的扩展大小的数据吗?我可以将可能的 varchar 数据输入到类别中,然后将一些数据提取到几个较小的,可能是 int 或 bool 类型的列中,然后以这种方式缩小 varchar 列的范围吗?
You can do a lot here. It may be best to go with a first assumption and then re-design step by step using real-life measured performance data. Good luck.
你可以在这里做很多事情。最好先进行第一个假设,然后使用实际测量的性能数据逐步重新设计。祝你好运。
回答by duffymo
Performance? No. Disk storage? Yes, but it's cheap and plentiful. Unless your database will grow to terabyte scale you're probably okay.
表现?否。磁盘存储?是的,但它又便宜又丰富。除非您的数据库增长到 TB 级,否则您可能没问题。
回答by DCH
Some of you are mistaken thinking that a varchar(200)
takes up more table size on disk than a varchar(20)
. This is not the case. Only when you go beyond 255 chars does mysql use an extra byte to determine the length of the varchar
field data.
你们中的一些人错误地认为 avarchar(200)
在磁盘上占用的表大小比 a 多varchar(20)
。不是这种情况。只有当您超过 255 个字符时,mysql 才会使用额外的字节来确定varchar
字段数据的长度。
回答by Rizwan Kassim
There can be performance hits - but usually not on a level that most users would notice.
可能会出现性能下降 - 但通常不会达到大多数用户会注意到的水平。
When the size of each field is known in advance, MySQL knows exactly how many bytes are between each field/row and can page forward without reading all the data. Using variable characters diminshes this ability for optimization.
当预先知道每个字段的大小时,MySQL 确切知道每个字段/行之间有多少字节,并且可以在不读取所有数据的情况下向前翻页。使用可变字符会削弱这种优化能力。
Does varchar result in performance hit due to data fragmentation?
Even better, char vs varchar.
更好的是,char 与 varchar。
For most uses, you'll be fine with either - but there isa difference, and for large scale databases, there are reasons why you'd pick one or the other.
对于大多数用途,你会被罚款或者用-但有是一个区别,而对于大型数据库,是有原因的,你会选择一个或另一个。
回答by user2903114
as per the datatype name suggests this is VARCHAR i.e. variable chars data storage, mysql engine itself allocates the memory being uses as per the stored data, so there is no performance hit as per my knowledge.
正如数据类型名称所暗示的那样,这是 VARCHAR,即可变字符数据存储,mysql 引擎本身根据存储的数据分配正在使用的内存,因此据我所知,没有性能下降。
回答by jgmjgm
You should try to view a varchar column the same as you would a char column in most scenarios and set the length conservatively. You don't have to always think of var modifier so much as something that impacts your decision making on the maximum length. It really should be seen as a performance hint instead that the strings supplied will be of varying lengths.
在大多数情况下,您应该尝试像查看 char 列一样查看 varchar 列,并保守地设置长度。您不必总是考虑 var 修饰符,因为它会影响您对最大长度的决策。它确实应该被视为性能提示,而不是提供的字符串将具有不同的长度。
It's not a directive that has to be strictly followed by database internals, it can be completely ignored. Do take care with this however as sometimes implementation can leak (fixed length and padding for example) even though it shouldn't in an ideal world.
它不是数据库内部必须严格遵循的指令,它可以完全忽略。但是请注意这一点,因为有时实现可能会泄漏(例如固定长度和填充),即使它不应该在理想世界中。
If you have a varchar(255) then you have no guarantee that performance wise it's always going to behave any differently to a char(255) in all circumstance.
如果你有一个 varchar(255) 那么你不能保证在性能方面它在所有情况下总是会表现得与 char(255) 有所不同。
It can seem easy to set it at something such as 255, 65535, etc inline with the advice given in the manual about storage requirements. This gives the impression that any value between 0 (yes, it's a thing) and 255 will have the same impact. However that's not something that can be fully guaranteed.
根据手册中关于存储要求的建议,将其设置为 255、65535 等似乎很容易。这给人的印象是 0(是的,这是一个东西)和 255 之间的任何值都会产生相同的影响。然而,这并不是可以完全保证的。
Storage requirements do tend to be true or a good indicator for decent and mature persistent storage engines in terms of row storage. It isn't as strong an indicator for things such as indexes.
就行存储而言,存储需求确实往往是真实的,或者是体面和成熟的持久存储引擎的一个很好的指标。对于诸如索引之类的事物来说,它并不是一个强有力的指标。
It's sometimes a difficult question, exactly how long should a piece of string be so setting it up to the highest bound you know it should be within but that has no impact. Unfortunately this is often something left to the user to work out and it's really somewhat arbitrary. You can't really say never oversize a string because there maybe cases where you're not exactly sure.
有时这是一个困难的问题,一段字符串到底应该多长,以便将它设置到您知道它应该在的最高界限内,但这没有影响。不幸的是,这通常是留给用户解决的问题,而且确实有些武断。你真的不能说永远不要过大的字符串,因为在某些情况下你不能完全确定。
You should ensure that MySQL queries throw an error when a string is too long rather than truncate so that at least you know if it might be too short from error emissions. Resizing columns to enlarge or shrink them can be an expensive DDL operation, this should be kept in mind.
您应该确保当字符串太长而不是截断时,MySQL 查询会抛出错误,以便至少您知道它是否可能因错误发射而太短。调整列大小以放大或缩小它们可能是一项昂贵的 DDL 操作,应牢记这一点。
Character set should also be considered where the length and performance comes into play. The length refers to this rather than bytes. If using utf8 for example, (not MB4) then varchar(255) is really varbinary(3 * 255). It's hard to know how things like this will really play out without running tests and looking deeply into source code/documentation. Because of this there is scope for excessive length to have an unexpectedly inflated impact. this doesn't only apply to performance. If you one day need to change the character set of a varchar column to a larger one then you might end up hitting some limit with no recourse if you allowed gratuitously long strings to be present that could have been avoided. This is normally a fairly niche problem but it does come up, it was recently a significant problem with the introduction of utf8mb4 for MySQL and indexes which have a limit on key length.
在长度和性能发挥作用的地方还应该考虑字符集。长度指的是 this 而不是字节。例如,如果使用 utf8(不是 MB4),那么 varchar(255) 实际上是 varbinary(3 * 255)。如果不运行测试和深入查看源代码/文档,很难知道这样的事情将如何真正发挥作用。因此,过长的长度可能会产生意外膨胀的影响。这不仅适用于性能。如果有一天您需要将 varchar 列的字符集更改为更大的字符集,那么如果您允许出现本来可以避免的无缘无故的长字符串,那么您最终可能会遇到一些无法追索的限制。这通常是一个相当小众的问题,但它确实出现了,
If it turns out that MAX(LENGTH(column)) is always < 64 (such as if it was decided there would be a limit on input that wasn't matched by the column definition) but you have varchar(255) then there's a good chance that you'll be using four times more space than needed in some scenarios.
如果事实证明 MAX(LENGTH(column)) 总是 < 64(例如,如果决定对与列定义不匹配的输入有限制)但是你有 varchar(255) 那么有一个在某些情况下,您很有可能会使用比所需空间多四倍的空间。
This might include:
这可能包括:
- Different engines, some may ignore it altogether.
- Buffer sizes, for example update or insert might have to allocate the full 255 (although I have not checked the source code to prove this, it is only a hypothetical).
- Indexes, this will be immediately obvious if you try to make a composite key from a lot of varchar(255) columns.
- Intermediate tables and possibly result sets. Given the way transactions work, it might not always be possible for something to use the actual max length of strings in a column as opposed to the defined limit.
- Internal predictive optimisations might take the max length as an input.
- Changes in database implementation versions.
- 不同的引擎,有些人可能会完全忽略它。
- 缓冲区大小,例如更新或插入可能必须分配完整的 255(虽然我没有检查源代码来证明这一点,但这只是一个假设)。
- 索引,如果您尝试从许多 varchar(255) 列创建组合键,这将立即显而易见。
- 中间表和可能的结果集。鉴于事务的工作方式,与定义的限制相反,某些东西可能并不总是可以使用列中字符串的实际最大长度。
- 内部预测优化可能将最大长度作为输入。
- 数据库实现版本的变化。
As a rule of thumb there's really no need for a varchar to be longer than it needs to be anyway, performance issues or not so I recommend sticking to that when you can. Taking more effort to sample the size of your data, enforce a true limit or find out the true limit through asking/research is the ideal approach.
根据经验,varchar 确实不需要比它需要的更长,无论是否存在性能问题,因此我建议您尽可能坚持使用。更加努力地对数据大小进行抽样、强制执行真正的限制或通过询问/研究找出真正的限制是理想的方法。
When you can't, if you want to do something such as varchar(255) for cases when in doubt then I recommend doing the science. This might consist of duplicating the table, reducing the size of the var char column then copying the data into it from the original and looking at the size of index/row data (index the column as well, also try it as a primary key which might behave differently in InnoDB as rows are ordered by primary key). At the very least this way you'll know if you have an impact on IO which tends to be one of the most sensitive bottlenecks. Testing for memory usage is more difficult, it's hard to test that exhaustively. I would recommend testing potential worst cases (queries with lots of intermediate in memory results, check with explain for large temp tables, etc).
如果你不能,如果你想对有疑问的情况做诸如 varchar(255) 之类的事情,那么我建议你做科学。这可能包括复制表,减少 var char 列的大小,然后将数据从原始数据复制到其中,并查看索引/行数据的大小(也对列进行索引,也可以尝试将其作为主键) InnoDB 中的行为可能会有所不同,因为行按主键排序)。至少通过这种方式,您会知道是否对 IO 产生影响,而 IO 往往是最敏感的瓶颈之一。测试内存使用更困难,很难彻底测试。我建议测试潜在的最坏情况(具有大量中间内存结果的查询,检查大型临时表的解释等)。
If you know there's not going to be many rows in the table, you aren't going to use the column for joins, indexes (especially composite, unique), etc then you most likely wont have many problems.
如果您知道表中不会有很多行,您不会将该列用于连接、索引(尤其是复合索引、唯一索引)等,那么您很可能不会遇到很多问题。
回答by Rob Farley
Being varchar, rather than just char, the size is based on an internal field to indicate its actual length and the string itself. So using varchar(200) is not very different to using varchar(150), except that you have the potential to store more.
作为 varchar,而不仅仅是 char,大小基于内部字段以指示其实际长度和字符串本身。因此,使用 varchar(200) 与使用 varchar(150) 没有太大区别,只是您有可能存储更多。
And you should consider what happens on an update, when a row grows. But if this is rare, then you should be fine.
并且您应该考虑更新时会发生什么,当一行增长时。但如果这是罕见的,那么你应该没问题。