SQL 为什么不使用 varchar(max)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7141402/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why not use varchar(max)?
提问by AtaLoss
I'm a bit old school when it comes to database design, so I'm totally for using the correct data sizes in columns. However, when reviewing a database for a friend, I noticed he used varchar(max)
a lot. Now, my immediate thought was to throw it back to him and tell him to change it. But then I thought about it and couldn't come up with a good reason for him not to use it (he'd used a case type tool to generate the db, if you're wondering).
我在数据库设计方面有点老派,所以我完全赞成在列中使用正确的数据大小。但是,在为朋友查看数据库时,我注意到他使用varchar(max)
了很多。现在,我的直接想法是把它扔回给他并告诉他改变它。但后来我想了想,想不出一个很好的理由让他不使用它(如果你想知道的话,他使用了一个案例类型工具来生成数据库)。
I've been researching the topic of varchar(max)
usage and I can't really come up with any good reason for him not to use it.
我一直在研究使用这个话题,varchar(max)
我真的想不出任何好的理由让他不使用它。
He doesn't use the columns for indexes, the application that sits on the db has limitations on the input, so it won't allow massive entries in the fields.
他不使用索引的列,位于数据库上的应用程序对输入有限制,因此不允许在字段中输入大量条目。
Any help would be appreciated to help me make him see the light :).
任何帮助将不胜感激,以帮助我让他看到光明:)。
采纳答案by Russ Clarke
My answer to this, isn't about the usage of Max, as much as it is about the reason for VARCHAR(max) vs TEXT.
我对此的回答不是关于 Max 的使用,而是关于 VARCHAR(max) 与 TEXT 的原因。
In my book; first of all, Unless you can be absolutely certain that you'll never encode anything but english text and people won't refer to names of foreign locations, then you should use NVARCHAR or NTEXT.
在我的书中;首先,除非您可以绝对确定您永远不会编码除英文文本之外的任何内容并且人们不会引用外国位置的名称,否则您应该使用 NVARCHAR 或 NTEXT。
Secondly, it's what the fields allow you to do.
其次,这是字段允许您执行的操作。
TEXT is hard to update in comparison to VARCHAR, but you get the advantage of Full Text Indexing and lots of clever things.
与 VARCHAR 相比,TEXT 难以更新,但您可以获得全文索引的优势和许多聪明的东西。
On the other hand, VARCHAR(MAX) has some ambiguity, if the size of the cell is < 8000 chars, it will be treated as Row data. If it's greater, it will be treated as a LOB for storage purposes. Because you can't know this without querying RBAR, this may have optimization strategies for places where you need to be sure about your data and how many reads it costs.
另一方面,VARCHAR(MAX) 有一些歧义,如果单元格的大小小于 8000 个字符,它将被视为 Row 数据。如果它更大,它将被视为用于存储目的的 LOB。因为不查询 RBAR 就无法知道这一点,所以对于需要确定数据以及读取成本的地方,这可能具有优化策略。
Otherwise, if your usage is relatively mundane and you don't expect to have problems with the size of data (IE you're using .Net and therefore don't have to be concerned about the size of your string/char* objects) then using VARCHAR(max) is fine.
否则,如果您的使用相对普通,并且您不希望数据大小出现问题(即您使用的是 .Net,因此不必担心 string/char* 对象的大小)那么使用 VARCHAR(max) 就可以了。
回答by Shiraz Bhaiji
There is a blog post about why not to use varchar max here
有一篇关于为什么不在这里使用 varchar max 的博客文章
Edit
编辑
The basic difference is where the data is stored. A SQL Data row has a max size of 8000 bytes (or was it 8K). Then a 2GB varchar(max) cannot be stored in the data row. SQL Server stores it "Out of row".
基本的区别在于数据的存储位置。SQL 数据行的最大大小为 8000 字节(或者是 8K)。那么 2GB varchar(max) 不能存储在数据行中。SQL Server 将其存储为“行外”。
Therefore you could get a performance hit since the data will not be in the same place on disk, see: http://msdn.microsoft.com/en-us/library/ms189087.aspx
因此,您可能会受到性能影响,因为数据不会位于磁盘上的同一位置,请参阅:http: //msdn.microsoft.com/en-us/library/ms189087.aspx
回答by Scott Johnston
If you are working in an OLTP environment, you are all about the performance. From overhead and tuning concerns to indexing limitations and query bottlenecks. Using a varcahr(max) or any other LOB type will most likely contravene most design best practices, so unless there is a specific business need that cannot be handled through the use of some other typing mechanism and only a varchar(max) will fit the bill then why subject your system and applications to the kind of overhead and performance issues inherent in one of the LOB datatypes?
如果您在 OLTP 环境中工作,那么您就是在关注性能。从开销和调整问题到索引限制和查询瓶颈。使用 varcahr(max) 或任何其他 LOB 类型很可能会违反大多数设计最佳实践,因此除非存在无法通过使用其他类型机制处理的特定业务需求,并且只有 varchar(max) 适合bill 那么为什么要让您的系统和应用程序受到其中一种 LOB 数据类型固有的开销和性能问题的影响呢?
If on the other hand you are working in an OLAP environment or in a Star Schema DW environment with Dimension tables with descriptors fields that naturally need to be verbose then a varchar(max), as long as you are not adding that to an index, may be useful. Still I would recommend even then to use a char(x) varchar(x) As it is always a best practice to only use those resources you absolutely must have to get the job done.
另一方面,如果您在 OLAP 环境或 Star Schema DW 环境中工作,维度表带有自然需要冗长的描述符字段,那么 varchar(max),只要您不将其添加到索引中,可能有用。即便如此,我仍然建议使用 char(x) varchar(x) 因为仅使用那些绝对必须完成工作的资源始终是最佳实践。
回答by HLGEM
They should NOT be used unless you expect large amounts of data and here is the reason why (directly from Books Online):
除非您期望有大量数据,否则不应使用它们,原因如下(直接来自在线图书):
Columns that are of the large object (LOB) data types ntext, text, varchar(max), nvarchar(max), varbinary(max), xml, or image cannot be specified as key columns for an index.
大对象 (LOB) 数据类型 ntext、text、varchar(max)、nvarchar(max)、varbinary(max)、xml 或 image 的列不能指定为索引的键列。
If you want to cripple performance, use nvarchar for everything.
如果您想削弱性能,请对所有内容使用 nvarchar。
回答by Donny V.
Redgate wrote a great article about this.
https://www.red-gate.com/simple-talk/sql/database-administration/whats-the-point-of-using-varcharn-anymore/
Redgate 写了一篇关于这个的很棒的文章。
https://www.red-gate.com/simple-talk/sql/database-administration/whats-the-point-of-using-varcharn-anymore/
Conclusions
结论
- Where appropriate, use VARCHAR(n) over VARCHAR(MAX) for reasons of good design if not performance benefits, and because VARCHAR(MAX) data does not compress
- Storing large strings takes longer than storing small strings.
- Updating an in-row VARCHAR(MAX) value from below 8,000 to over 8,000 will be relatively slow, but the difference for a single transaction will likely not be measurable.
- Updating an in-row VARCHAR(MAX) value from over 8,000 to below 8,000 will be faster than if the table is set to store data out-of-row.
- Using the out-of-row option for VARCHAR(MAX) will cause slower writes until the strings are very long.
- 在适当的情况下,如果不是性能优势,出于良好设计的原因,使用 VARCHAR(n) 而不是 VARCHAR(MAX),并且因为 VARCHAR(MAX) 数据不会压缩
- 存储大字符串比存储小字符串需要更长的时间。
- 将行内 VARCHAR(MAX) 值从低于 8,000 更新到超过 8,000 会相对较慢,但可能无法测量单个事务的差异。
- 将行内 VARCHAR(MAX) 值从超过 8,000 更新到低于 8,000 将比将表设置为存储行外数据更快。
- 使用 VARCHAR(MAX) 的行外选项会导致写入速度变慢,直到字符串很长。
回答by Vlad Kirov
The diff is in next:VARCHAR(X)
can be indexed and stored in the MDF/NDF
data file.VARCHAR(MAX)
can't be indexed because can reach high volume and then will be stored as a seperated file and not in the MDF/NDF
data file.
差异在下一个:VARCHAR(X)
可以索引并存储在MDF/NDF
数据文件中。VARCHAR(MAX)
无法索引,因为可以达到高容量,然后将存储为单独的文件而不是MDF/NDF
数据文件中。
回答by Stephen Lauzon
It is somewhat old-fashioned to believe that the application will only pass short strings to the database, and that will make it okay.
认为应用程序只会将短字符串传递给数据库,这样就可以了,这有点过时了。
In modern times, you HAVEto anticipate that the database will be accessed primarily by the current application, but there may be a future version of the application, (will the developer of that version know to keep strings below a certain length?)
在近代,你HAVE预测数据库将由目前的应用主要是访问,但也有可能是应用程序的未来版本,(将那个版本知道的开发商保持一定的长度低于字符串?)
You MUSTanticipate that web services, ETL processes, LYNC to SQL, and any other number of already existing, and/or not-yet-existing technologies will be used to access your database.
您必须预见到 Web 服务、ETL 过程、LYNC 到 SQL 以及任何其他数量的现有和/或尚不存在的技术将用于访问您的数据库。
Generally speaking I try not to go over varchar(4000), because it's four-thousand characters, after all. If I exceed that, then I look to other datatypes to store whatever it is I am trying to store. Brent Ozarhas written some pretty great stuffon this.
一般来说,我尽量不去强调 varchar(4000),因为毕竟它有四千个字符。如果我超过了那个,那么我会寻找其他数据类型来存储我想要存储的任何内容。 Brent Ozar在这方面写了一些非常棒的东西。
All that said, it is important to evaluate the currentdesign's approach to your currentrequirements when you are working on a project. Have an idea of how the various parts work, understand the trade-offs of various approaches and solve the problem at hand. Exercising some great axiom can lead to blind adherence which might turn you into a lemming.
综上所述,在您从事项目时,评估当前设计对您当前需求的方法非常重要。了解各个部分的工作原理,了解各种方法的权衡并解决手头的问题。运用一些伟大的公理可能会导致盲目的坚持,这可能会让你变成旅鼠。
回答by at.
I don't know how sql server handles large (declared) varchar fields from a performance, memory and storage perspective.. but assuming it does so as efficiently as smaller declared varchar fields, there's still the benefit of integrity constraints.
我不知道 sql server 如何从性能、内存和存储的角度处理大的(声明的)varchar 字段..但假设它与较小的声明的 varchar 字段一样有效,仍然存在完整性约束的好处。
The application sitting on the db is supposedto have limits on the input, but the database can properly report an error if the application has a bug in this respect.
坐在db上的应用程序应该对输入有限制,但是如果应用程序在这方面有错误,数据库可以正确报告错误。