MySQL TINYTEXT、TEXT、MEDIUMTEXT 和 LONGTEXT 最大存储大小

Question

提问by Lalith B

Per the MySQL docs, there are four TEXT types:

根据MySQL 文档，有四种 TEXT 类型：

TINYTEXT
TEXT
MEDIUMTEXT
LONGTEXT

小文本
文本
中文本
长文

What is the maximum length that I can store in a column of each data type assuming the character encoding is UTF-8?

假设字符编码为 UTF-8，我可以在每种数据类型的列中存储的最大长度是多少？

Answer 1

回答by Bridge

From the documentation:

从文档：

      Type | Maximum length
-----------+-------------------------------------
  TINYTEXT |           255 (2⁸−1) bytes
      TEXT |        65,535 (2¹⁶−1) bytes = 64 KiB
MEDIUMTEXT |    16,777,215 (2²⁴−1) bytes = 16 MiB
  LONGTEXT | 4,294,967,295 (2³²−1) bytes =  4 GiB

Note that the number of charactersthat can be stored in your column will depend on the character encoding.

请注意，列中可以存储的字符数取决于字符编码。

Answer 2

回答by Ankan-Zerob

Expansion of the same answer

同一个答案的展开

This SO postoutlines in detail the overheads and storage mechanisms.
As noted from point (1), A VARCHAR should always be used instead of TINYTEXT. However, when using VARCHAR, the max rowsize should not exceeed 65535 bytes.
As outlined here http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-utf8.html, max 3 bytes for utf-8.

这篇SO 文章详细概述了开销和存储机制。
如第 (1) 点所述，应始终使用 VARCHAR 而不是 TINYTEXT。但是，在使用 VARCHAR 时，最大行大小不应超过 65535 字节。
如此处所述http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-utf8.html，utf-8 最多 3 个字节。

THIS IS A ROUGH ESTIMATION TABLE FOR QUICK DECISIONS!

这是用于快速决策的粗略估算表！

So the worst case assumptions (3 bytes per utf-8 char) to best case (1 byte per utf-8 char)
Assuming the english language has an average of 4.5 letters per word
x is the number of bytes allocated

所以最坏的情况假设（每个 utf-8 字符 3 个字节）到最好的情况（每个 utf-8 字符 1 个字节）
假设英语每个单词平均有 4.5 个字母
x 是分配的字节数

x-x

xx

      Type | A= worst case (x/3) | B = best case (x) | words estimate (A/4.5) - (B/4.5)
-----------+---------------------------------------------------------------------------
  TINYTEXT |              85     | 255               | 18 - 56
      TEXT |          21,845     | 65,535            | 4,854.44 - 14,563.33  
MEDIUMTEXT |       5,592,415     | 16,777,215        | 1,242,758.8 - 3,728,270
  LONGTEXT |   1,431,655,765     | 4,294,967,295     | 318,145,725.5 - 954,437,176.6

Please refer to Chris V's answer as well : https://stackoverflow.com/a/35785869/1881812

请参考 Chris V 的回答：https: //stackoverflow.com/a/35785869/1881812

Answer 3

回答by ChrisV

Rising to @Ankan-Zerob's challenge, this is my estimate of the maximum length which can be stored in each text type measured in words:

面对@Ankan-Zerob 的挑战，这是我对可以存储在以字为单位的每种文本类型中的最大长度的估计：

      Type |         Bytes | English words | Multi-byte words
-----------+---------------+---------------+-----------------
  TINYTEXT |           255 |           ±44 |              ±23
      TEXT |        65,535 |       ±11,000 |           ±5,900
MEDIUMTEXT |    16,777,215 |    ±2,800,000 |       ±1,500,000
  LONGTEXT | 4,294,967,295 |  ±740,000,000 |     ±380,000,000

In English, 4.8 letters per word is probably a good average (eg norvig.com/mayzner.html), though word lengths will vary according to domain (e.g. spoken language vs. academic papers), so there's no point being too precise. English is mostly single-byte ASCII characters, with very occasional multi-byte characters, so close to one-byte-per-letter. An extra character has to be allowed for inter-word spaces, so I've rounded down from 5.8 bytes per word. Languages with lots of accents such as say Polish would store slightly fewer words, as would e.g. German with longer words.

在英语中，每个单词 4.8 个字母可能是一个很好的平均值（例如norvig.com/mayzner.html），尽管单词长度会根据域（例如口语与学术论文）而有所不同，所以没有必要过于精确。英语主要是单字节 ASCII 字符，偶尔会有多字节字符，每个字母接近一个字节。字间空间必须允许一个额外的字符，所以我从每个字 5.8 个字节向下舍入。带有大量口音的语言（例如 say 波兰语）会存储略少的单词，例如德语中的单词较长。

Languages requiring multi-bytecharacters such as Greek, Arabic, Hebrew, Hindi, Thai, etc, etc typically require two bytes per character in UTF-8. Guessing wildly at 5 letters per word, I've rounded down from 11 bytes per word.

需要多字节字符的语言，例如希腊语、阿拉伯语、希伯来语、印地语、泰语等，通常需要 UTF-8 中的每个字符两个字节。疯狂地猜测每个单词 5 个字母，我已经从每个单词 11 个字节向下取整。

CJK scripts (Hanzi, Kanji, Hiragana, Katakana, etc) I know nothing of; I believe characters mostly require 3 bytes in UTF-8, and (with massive simplification) they might be considered to use around 2 characters per word, so they would be somewhere between the other two. (CJK scripts are likely to require less storage using UTF-16, depending).

CJK 脚本（汉字、汉字、平假名、片假名等）我一无所知；我相信字符在 UTF-8 中通常需要 3 个字节，并且（经过大量简化）它们可能被认为每个单词使用大约 2 个字符，因此它们将介于其他两个字符之间。（使用 UTF-16 时，CJK 脚本可能需要较少的存储空间，具体视情况而定）。

This is of course ignoring storage overheads etc.

这当然忽略了存储开销等。

Answer 4

回答by colin0117

This is nice but doesn't answer the question:

这很好，但没有回答问题：

"A VARCHAR should always be used instead of TINYTEXT." Tinytext is useful if you have wide rows - since the data is stored off the record. There is a performance overhead, but it does have a use.

“应始终使用 VARCHAR 而不是 TINYTEXT。” 如果您有很宽的行，Tinytext 很有用 - 因为数据存储在记录之外。有性能开销，但它确实有用。

MySQL TINYTEXT、TEXT、MEDIUMTEXT 和 LONGTEXT 最大存储大小

提问by Lalith B

回答by Bridge

回答by Ankan-Zerob

回答by ChrisV

回答by colin0117

相关推荐

最近更新

标签

MySQL TINYTEXT、TEXT、MEDIUMTEXT 和 LONGTEXT 最大存储大小

提问by Lalith B

回答by Bridge

回答by Ankan-Zerob

回答by ChrisV

回答by colin0117

相关推荐

MySQL 查询以选择以特定字符结尾的字符串

将 MySQL 数据库导入 MS SQL Server

MySQL：LAST_INSERT_ID() 返回 0

从 mysql 查询中获取当前会话/进程 ID

相关推荐

最近更新

标签