MySQL 将 base64 编码数据存储为 BLOB 或 TEXT 数据类型
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14042599/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Storing base64 encoded data as BLOB or TEXT datatype
提问by Panagiotis Moustafellos
We have a MySQL InnoDB table holding ~10 columns of small base64 encoded javascript files and png (<2KB size) images base64 encoded as well.
我们有一个 MySQL InnoDB 表,其中包含约 10 列的小型 base64 编码 javascript 文件和 png(<2KB 大小)图像 base64 编码。
There are few inserts and a lot of reads comparatively, however the output is being cached on a Memcached instance for some minutes to avoid subsequent reads.
相对而言,插入较少,读取较多,但是输出会在 Memcached 实例上缓存几分钟,以避免后续读取。
As it is right now we are using BLOB
for those columns, but I am wondering if there is an advantage in switching to TEXT
datatype in terms of performance or snapshot backing up.
现在我们正在使用BLOB
这些列,但我想知道TEXT
在性能或快照备份方面切换到数据类型是否有优势。
My search digging indicates that BLOB
and TEXT
for my case are close to identical and since I do not know before-hand what type of data are actually going to be stored I went for BLOB
.
我的搜索挖掘表明,BLOB
和TEXT
我的情况接近相同的,因为我不知道前手什么类型的数据实际上将被存储我去了BLOB
。
Do you have any pointers on the TEXT vs BLOB debate for this specific case?
对于此特定案例,您是否有关于 TEXT 与 BLOB 辩论的任何指示?
回答by eggyal
One shouldn't store Base64-encoded data in one's database...
人们不应该在自己的数据库中存储 Base64 编码的数据......
Base64 is a means of representing arbitrary binary data using only printable text characters: it was designed for situations where one needs to transfer such binary data across a protocol or medium that can handle only printable-text (e.g. SMTP/email). It increases the data size (by 33%) and adds the computational cost of encoding/decoding, so it should be avoided unless absolutely necessary.
Base64 是一种仅使用可打印文本字符表示任意二进制数据的方法:它是为需要通过只能处理可打印文本(例如 SMTP/电子邮件)的协议或介质传输此类二进制数据的情况而设计的。它增加了数据大小(增加了 33%)并增加了编码/解码的计算成本,因此除非绝对必要,否则应避免使用。
By contrast, the whole point of BLOB
columns is that they store raw binary strings. So just go ahead and store your stuff directly into your BLOB
columns without first Base64-encoding them. Usually you'll want to store related metadata in other columns, such as file version/last modified date, media type, and (in the case of text files, such as JavaScript sources) character encoding. You might decide to use TEXT
type columns for the text files, not only so that MySQL will natively track character encoding for you, but also so that it can transcode to alternative character sets and/or inspect/manipulate the text as may be required (now or in the future).
相比之下,列的全部意义BLOB
在于它们存储原始二进制字符串。因此,只需继续将您的内容直接存储到您的BLOB
列中,而无需先对它们进行 Base64 编码。通常,您需要将相关元数据存储在其他列中,例如文件版本/上次修改日期、媒体类型和(对于文本文件,例如 JavaScript 源)字符编码。您可能决定TEXT
为文本文件使用类型列,不仅这样 MySQL 会在本地为您跟踪字符编码,而且还可以将其转码为替代字符集和/或根据需要检查/操作文本(现在或将来)。
The (erroneous) idea that SQL databases require printable-text encodings like Base64 for handling arbitrary binary data has been perpetuated by a large number of ill-informed tutorials. This idea appears to be seated in the mistaken belief that, because SQL comprises only printable-text in other contexts, it must surely require it for binary data too (at least for data transfer, if not for data storage). This is simply not true: SQL can convey binary data in a number of ways, including plain string literals (provided that they are properly quoted and escaped like any other string); of course, the preferred way to pass data (of any type) to your database is through parameterised queries, and parameters can just as easily contain binary data as they can anything else.
SQL 数据库需要像 Base64 这样的可打印文本编码来处理任意二进制数据的(错误的)想法已经被大量信息不足的教程所延续。这个想法似乎是基于错误的信念,因为 SQL 仅包含其他上下文中的可打印文本,它肯定也需要二进制数据(至少对于数据传输,如果不是用于数据存储)。这根本不是真的:SQL 可以通过多种方式传送二进制数据,包括纯字符串文字(前提是它们像任何其他字符串一样被正确引用和转义);当然,将数据(任何类型)传递到数据库的首选方法是通过参数化查询,并且参数可以像包含其他任何内容一样轻松地包含二进制数据。
For what it's worth, I usually altogether avoid storing items like this in the RDBMS and prefer instead to use those highly optimised file storage databases known as filesystems—but that's another matter altogether.
对于它的价值,我通常完全避免在 RDBMS 中存储这样的项目,而是更喜欢使用那些被称为文件系统的高度优化的文件存储数据库——但这完全是另一回事。
...unless it's cached for performance reasons...
...除非出于性能原因将其缓存...
The only situation in which there might be some benefit from storing Base64-encoded data is where data is frequently retrieved from the database and transmitted across a protocol that requires that encoding—in which case, storing the Base64-encoded representation would save from having to perform the encoding operation on the otherwise raw data upon every fetch.
存储 Base64 编码数据可能带来一些好处的唯一情况是经常从数据库中检索数据并通过需要该编码的协议传输的情况 - 在这种情况下,存储 Base64 编码表示将不必在每次获取时对其他原始数据执行编码操作。
However, note in this sense that the Base64-encoded storage is merely acting as a cache, much like one might store denormalised data for performance reasons.
但是,请注意,Base64 编码的存储仅用作缓存,就像出于性能原因可能存储非规范化数据一样。
...in which case it should be TEXT
not BLOB
......在这种情况下,它应该是TEXT
不BLOB
As alluded to above, the difference between TEXT
and BLOB
really comes down to the fact that TEXT
columns are stored together with text-specific metadata (such as character encodingand collation), whereas BLOB
columns are not. This additional metadata enables MySQL to transcode characters between storage and connection character sets (where appropriate) and perform fancy character equivalence/ordering.
如上面提到的,之间的差异TEXT
和BLOB
真的可以归结为这样一个事实TEXT
列与文本特定的元数据(如存储在一起的字符编码和核对),而BLOB
列不可。这个额外的元数据使 MySQL 能够在存储和连接字符集(在适当的情况下)之间对字符进行转码,并执行花哨的字符等效/排序。
Generally speaking: if two clients working in different character sets should see the same bytes, then you want a BLOB
column; if they should see the same charactersthen you want a TEXT
column.
一般而言:如果使用不同字符集的两个客户端应该看到相同的bytes,那么您需要一个BLOB
列;如果他们应该看到相同的字符,那么您需要一TEXT
列。
With Base64, those two clients must ultimately find that the data decodes to the same bytes; but they should see that the encoded data has the same characters. For example, suppose one wishes to insert the Base64-encoding of 'Hello world!'
(which is 'SGVsbG8gd29ybGQh'
). If the inserting application is working in the UTF-8 character set, then it will send the byte sequence 0x53475673624738676432397962475168
to the database.
使用 Base64,这两个客户端必须最终发现数据解码为相同的字节;但他们应该看到编码数据具有相同的字符。例如,假设有人希望插入 Base64 编码'Hello world!'
(即'SGVsbG8gd29ybGQh'
)。如果插入应用程序在 UTF-8 字符集中工作,那么它将字节序列发送0x53475673624738676432397962475168
到数据库。
if that byte sequence is stored in a
BLOB
column and later retrieved by an application that is working in UTF-16*, the same byteswill be returned—which represent'升噳扇?搲?扇全'
and not the desired Base64-encoded value; whereasif that byte sequence is stored in a
TEXT
column and later retrieved by an application that is working in UTF-16, MySQL will transcode on-the-fly to return the byte sequence0x0053004700560073006200470038006700640032003900790062004700510068
—which represents the original Base64-encoded value'SGVsbG8gd29ybGQh'
as desired.
如果该字节序列存储在一个
BLOB
列和后来检索由在UTF-16工作的应用程序*,相同的字节将被返回,其代表'升噳扇?搲?扇全'
,而不是所期望的Base64编码值; 然而如果该字节序列存储在一个
TEXT
列中,然后由以 UTF-16 工作的应用程序检索,MySQL 将即时转码以返回字节序列0x0053004700560073006200470038006700640032003900790062004700510068
——它代表原始 Base64 编码值'SGVsbG8gd29ybGQh'
。
Of course, you could nevertheless use BLOB
columns and track the character encoding in some other way—but that would just needlessly reinvent the wheel, with added maintenance complexity and risk of introducing unintentional errors.
当然,您仍然可以使用BLOB
列并以其他方式跟踪字符编码——但这只会不必要地重新发明轮子,增加维护复杂性和引入意外错误的风险。
* Actually MySQL doesn't support using client character sets that are not byte-compatible with ASCII (and therefore Base64 encodings will always be consistent across any combination of them), but this example nevertheless serves to illustrate the difference between BLOB
and TEXT
column types and thus explains why TEXT
is technically correct for this purpose even though BLOB
will actually work without error (at least until MySQL adds support for non-ASCII compatible client character sets).
* 实际上 MySQL 不支持使用与 ASCII 字节不兼容的客户端字符集(因此 Base64 编码在它们的任何组合中始终保持一致),但此示例仍然用于说明BLOB
和TEXT
列类型之间的区别,因此解释了为什么TEXT
在技术上是正确的,即使BLOB
实际上不会出错(至少在 MySQL 添加对非 ASCII 兼容客户端字符集的支持之前)。