MySQL:大 VARCHAR 与文本?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2023481/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MySQL: Large VARCHAR vs. TEXT?
提问by Tom
I've got a messages table in MySQL which records messages between users. Apart from the typical ids and message types (all integer types) I need to save the actual message text as either VARCHAR or TEXT. I'm setting a front-end limit of 3000 characters which means the messages would never be inserted into the db as longer than this.
我在 MySQL 中有一个消息表,用于记录用户之间的消息。除了典型的 id 和消息类型(所有整数类型),我需要将实际的消息文本保存为 VARCHAR 或 TEXT。我将前端限制设置为 3000 个字符,这意味着消息永远不会被插入到数据库中超过这个长度。
Is there a rationale for going with either VARCHAR(3000) or TEXT? There's something about just writing VARCHAR(3000) that feels somewhat counter-intuitive. I've been through other similar posts on Stack Overflow but would be good to get views specific to this type of common message storing.
使用 VARCHAR(3000) 或 TEXT 是否有理由?仅仅编写 VARCHAR(3000) 有点违反直觉。我已经阅读了 Stack Overflow 上的其他类似帖子,但最好能获得特定于这种常见消息存储类型的视图。
采纳答案by MindStalker
TEXT
andBLOB
mayby stored off the table with the table just having a pointer to the location of the actual storage. Where it is stored depends on lots of things like data size, columns size, row_format, and MySQL version.VARCHAR
is stored inline with the table.VARCHAR
is faster when the size is reasonable, the tradeoff of which would be faster depends upon your data and your hardware, you'd want to benchmark a real-world scenario with your data.
TEXT
并且BLOB
可以通过将表存储在表外,而表只有一个指向实际存储位置的指针。它的存储位置取决于很多因素,例如数据大小、列大小、row_format 和 MySQL 版本。VARCHAR
与表内联存储。VARCHAR
当大小合理时更快,权衡更快取决于您的数据和您的硬件,您希望使用您的数据对实际场景进行基准测试。
回答by Michael J. Calkins
Can you predict how long the user input would be?
你能预测用户输入需要多长时间吗?
VARCHAR(X)
Case:user name, email, country, subject, password
TEXT
Case:messages, emails, comments, formatted text, html, code, images, links
MEDIUMTEXT
Case:large json bodies, short to medium length books, csv strings
LONGTEXT
Case:textbooks, programs, years of logs files, harry potter and the goblet of fire, scientific research logging
VARCHAR(X)
案例:用户名、邮箱、国家、主题、密码
文本
案例:消息、电子邮件、评论、格式化文本、html、代码、图像、链接
中文本
案例:大型 json 主体、中短书、csv 字符串
长文
案例:教科书、程序、多年的日志文件、哈利波特与火焰杯、科学研究日志
回答by James
Just to clarify the best practice:
只是为了澄清最佳实践:
Text format messages should almost always be stored as TEXT (they end up being arbitrarily long)
String attributes should be stored as VARCHAR (the destination user name, the subject, etc...).
文本格式的消息应该几乎总是存储为 TEXT(它们最终是任意长的)
字符串属性应存储为 VARCHAR(目标用户名、主题等...)。
I understand that you've got a front end limit, which is great until it isn't. *grin* The trick is to think of the DB as separate from the applications that connect to it. Just because one application puts a limit on the data, doesn't mean that the data is intrinsically limited.
我知道你有一个前端限制,这很好,直到它不是。*grin* 诀窍是将数据库与连接到它的应用程序分开。仅仅因为一个应用程序对数据设置了限制,并不意味着数据本质上是有限的。
What is it about the messages themselves that forces them to never be more then 3000 characters? If it's just an arbitrary application constraint (say, for a text box or something), use a TEXT
field at the data layer.
是什么消息本身迫使它们永远不会超过 3000 个字符?如果它只是一个任意的应用程序约束(例如,对于文本框或其他东西),请TEXT
在数据层使用一个字段。
回答by Michael Anderson
Disclaimer: I'm not a MySQL expert ... but this is my understanding of the issues.
免责声明:我不是 MySQL 专家……但这是我对这些问题的理解。
I think TEXT is stored outside the mysql row, while I think VARCHAR is stored as part of the row. There is a maximum row length for mysql rows .. so you can limit how much other data you can store in a row by using the VARCHAR.
我认为 TEXT 存储在 mysql 行之外,而我认为 VARCHAR 存储为行的一部分。mysql 行有一个最大行长度 .. 因此您可以使用 VARCHAR 限制您可以在一行中存储多少其他数据。
Also due to VARCHAR forming part of the row, I suspect that queries looking at that field will be slightly faster than those using a TEXT chunk.
同样由于 VARCHAR 构成行的一部分,我怀疑查看该字段的查询会比使用 TEXT 块的查询稍快。
回答by Rick James
Short answer:No practical, performance, or storage, difference.
简短回答:没有实际、性能或存储差异。
Long answer:
长答案:
There is essentially no difference (in MySQL) between VARCHAR(3000)
(or any other large limit) and TEXT
. The former will truncate at 3000 characters; the latter will truncate at 65535 bytes. (I make a distinction between bytesand charactersbecause a character can take multiple bytes.)
有本质上没有区别(在MySQL) VARCHAR(3000)
(或任何其他大的限制)和TEXT
。前者会截断 3000 个字符;后者将在 65535字节处截断。(我区分字节和字符,因为一个字符可以占用多个字节。)
For smaller limits in VARCHAR
, there are some advantages over TEXT
.
对于 中较小的限制VARCHAR
,相比 有一些优势TEXT
。
- "smaller" means 191, 255, 512, 767, or 3072, etc, depending on version, context, and
CHARACTER SET
. INDEXes
are limited in how big a column can be indexed. (767 or 3072 bytes; this is version and settings dependent)- Intermediate tables created by complex
SELECTs
are handled in two different ways -- MEMORY (faster) or MyISAM (slower). When 'large' columns are involved, the slower technique is automatically picked. (Significant changes coming in version 8.0; so this bullet item is subject to change.) - Related to the previous item, all
TEXT
datatypes (as opposed toVARCHAR
) jump straight to MyISAM. That is,TINYTEXT
is automatically worse for generated temp tables than the equivalentVARCHAR
. (But this takes the discussion in a third direction!) VARBINARY
is likeVARCHAR
;BLOB
is likeTEXT
.
- “更小”表示 191、255、512、767 或 3072 等,具体取决于版本、上下文和
CHARACTER SET
. INDEXes
列可以被索引的大小受到限制。(767 或 3072字节;这取决于版本和设置)- 由 complex 创建的中间表
SELECTs
以两种不同的方式处理 - MEMORY(更快)或 MyISAM(更慢)。当涉及“大”列时,会自动选择较慢的技术。(在 8.0 版中发生了重大变化;因此此项目符号可能会发生变化。) - 与上一项相关,所有
TEXT
数据类型(与 相对VARCHAR
)都直接跳转到 MyISAM。也就是说,TINYTEXT
生成的临时表比等效的VARCHAR
. (但这将讨论转向第三个方向!) VARBINARY
就像VARCHAR
;BLOB
就像TEXT
.
Rebuttal to other answers
反驳其他答案
The original question asked one thing (which datatype to use); the accepted answer answered something else (off-record storage). That answer is now out of date.
最初的问题问了一件事(使用哪种数据类型);接受的答案回答了其他问题(非记录存储)。这个答案现在已经过时了。
When this thread was started andanswered, there were only two "row formats" in InnoDB. Soon afterwards, two more formats (DYNAMIC
and COMPRESSED
) were introduced.
当这个线程被启动和回答时,InnoDB 中只有两种“行格式”。不久之后,又引入了两种格式(DYNAMIC
和COMPRESSED
)。
The storage location for TEXT
and VARCHAR()
is based on size, not on name of datatype. For an updateddiscussion of on/off-record storage of large text/blob columns, see this.
对于存储位置TEXT
和VARCHAR()
基于大小,而不是数据类型的名称。有关大文本/blob 列的记录/非记录存储的更新讨论,请参阅此。
回答by Max
The preceding answers don't insist enough on the main problem: even in very simple queries like
前面的答案对主要问题的坚持不够:即使在非常简单的查询中,例如
(SELECT t2.* FROM t1, t2 WHERE t2.id = t1.id ORDER BY t1.id)
a temporary table can be required, and if a VARCHAR
field is involved, it is converted to a CHAR
field in the temporary table. So if you have in your table say 500 000 lines with a VARCHAR(65000)
field, this column alone will use 6.5*5*10^9byte. Such temp tables can't be handled in memory and are written to disk. The impact can be expected to be catastrophic.
可能需要一个临时表,如果VARCHAR
涉及到一个CHAR
字段,则将其转换为临时表中的一个字段。因此,如果您的表中有 500 000 行带有VARCHAR(65000)
字段,则仅此列将使用6.5*5*10^9字节。此类临时表无法在内存中处理并写入磁盘。预计其影响将是灾难性的。
Source (with metrics): https://nicj.net/mysql-text-vs-varchar-performance/(This refers to the handling of TEXT
vs VARCHAR
in "standard"(?) MyISAM storage engine. It may be different in others, e.g., InnoDB.)
来源(带指标):https: //nicj.net/mysql-text-vs-varchar-performance/(这里指的是“标准”(?)MyISAM存储引擎中TEXT
vs的处理VARCHAR
。在其他的可能不同,例如,InnoDB。)
回答by Viktor Joras
There is a HUGEdifference between VARCHAR and TEXT. While VARCHAR fields can be indexed, TEXT fields cannot. VARCHAR type fields are stored inline while TEXT are stored offline, only pointers to TEXT data is actually stored in the records.
VARCHAR 和 TEXT 之间存在巨大差异。虽然可以对 VARCHAR 字段进行索引,但不能对 TEXT 字段进行索引。VARCHAR 类型的字段是内联存储的,而 TEXT 是离线存储的,实际上只有指向 TEXT 数据的指针存储在记录中。
If you have to index your field for faster search, update or delete than go for VARCHAR, no matter how big. A VARCHAR(10000000) will never be the same as a TEXT field bacause these two data types are different in nature.
如果您必须为您的字段建立索引以更快地搜索、更新或删除,而不是使用 VARCHAR,无论该字段有多大。VARCHAR(10000000) 永远不会与 TEXT 字段相同,因为这两种数据类型本质上是不同的。
- If you use you field only for archiving
- you don't care about data speed retrival
- you care about speed but you will use the operator '%LIKE%' in your search query so indexing will not help much
- you can't predict a limit of the data length
- 如果您仅将字段用于存档
- 你不关心数据速度检索
- 您关心速度,但您将在搜索查询中使用运算符“%LIKE%”,因此索引不会有太大帮助
- 您无法预测数据长度的限制
than go for TEXT.
比去文本。
回答by Creative87
Varchar is for small data like email addresses, while Text is for much bigger data like news articles, Blob for binary data such as images.
Varchar 适用于电子邮件地址等小数据,而 Text 适用于新闻文章等更大的数据,Blob 适用于图像等二进制数据。
The performance of Varchar is more powerful because it runs completely from memory, but this will not be the case if data is too big like varchar(4000)
for example.
Varchar 的性能更强大,因为它完全从内存运行,但如果数据太大,则不会出现这种情况varchar(4000)
,例如。
Text, on the other hand, does not stick to memory and is affected by disk performance, but you can avoid that by separating text data in a separate table and apply a left join query to retrieve text data.
另一方面,文本不会粘在内存中并且受磁盘性能的影响,但是您可以通过将文本数据分离到单独的表中并应用左连接查询来检索文本数据来避免这种情况。
Blob is much slower so use it only if you don't have much data like 10000 images which will cost 10000 records.
Blob 的速度要慢得多,因此仅当您没有太多数据(例如 10000 张图像(将花费 10000 条记录))时才使用它。
Follow these tips for maximum speed and performance:
遵循以下提示以获得最大速度和性能:
Use varchar for name, titles, emails
Use Text for large data
Separate text in different tables
Use Left Join queries on an ID such as a phone number
If you are going to use Blob apply the same tips as in Text
将 varchar 用于姓名、标题、电子邮件
对大数据使用文本
将不同表格中的文本分开
对 ID(例如电话号码)使用 Left Join 查询
如果您打算使用 Blob,请应用与 Text 中相同的提示
This will make queries cost milliseconds on tables with data >10 M and size up to 10GB guaranteed.
这将使对数据 > 10 M 且保证大小高达 10GB 的表的查询花费几毫秒。