MySQL 将数据存储在 blob 中与存储指向文件的指针有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13435187/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is difference between storing data in a blob, vs. storing a pointer to a file?
提问by user1832628
I have a question about the blob
data type in MySQL.
我对blob
MySQL 中的数据类型有疑问。
I read that the data type can be used to store files. I also read that an alternative is to store the file on disk and include a pointer to its location in the database (via a varchar column).
我读到数据类型可用于存储文件。我还读到,另一种方法是将文件存储在磁盘上,并在数据库中包含一个指向其位置的指针(通过 varchar 列)。
But I'm a little confused because I've read that blob fields are not stored in-row and require a separate look-up to retrieve its contents. So is that any different than storing a pointer to a file on the file system?
但我有点困惑,因为我读过 blob 字段不是按行存储的,需要单独查找才能检索其内容。那么这与在文件系统上存储指向文件的指针有什么不同吗?
采纳答案by Bruno Vieira
I read that the data type can be used to store files.
我读到数据类型可用于存储文件。
According to MySQL manualpage on Blob, A BLOB
is a binary large object that can hold a variable amount of data.
根据Blob上的MySQL 手册页,ABLOB
是一个二进制大对象,可以保存可变数量的数据。
Since it's a data type specific to store binary data it's common to use it to store files in binary format, being storing image files a very common use on web applications.
由于它是一种特定于存储二进制数据的数据类型,因此通常使用它来存储二进制格式的文件,存储图像文件在 Web 应用程序中非常常见。
For web applications this would mean that you would first need to convert your file into binary format and then store it, and every time you need to retrieve your file you would need to do the reverse process of converting them back to it's original format.
对于 Web 应用程序,这意味着您首先需要将文件转换为二进制格式然后存储它,并且每次需要检索文件时,您都需要执行将它们转换回原始格式的相反过程。
Besides that, storing large amount of data in your db MAYslow it down. Specially in systems that are not dedicated only to host a database.
除此之外,在您的数据库中存储大量数据可能会减慢速度。特别是在不专门用于托管数据库的系统中。
I also read that an alternative is to store the file on disk and include a pointer to its location in the database
我还读到另一种方法是将文件存储在磁盘上并包含一个指向其在数据库中位置的指针
Bearing in mind all above considerations a common practice for web applications is to store your files elsewhere than your MySQL and then simply store it's path on your database. This approach MAYspeed up your database when dealing with large amount of data.
牢记上述所有考虑因素,Web 应用程序的常见做法是将文件存储在 MySQL 以外的其他位置,然后简单地将其路径存储在数据库中。在处理大量数据时,这种方法可能会加速您的数据库。
But I'm a little confused because I've read that blob fields are not stored in-row and require a separate look-up to retrieve its contents.
但我有点困惑,因为我读过 blob 字段不是按行存储的,需要单独查找才能检索其内容。
In fact that would depend on what storage engine you are using since every engine treats data and stores it in different ways. For the InnoDB engine, which is suited for relational database you may want to read this article from MySQL Performance blogon how the blob is stored in MySQL.
事实上,这取决于您使用的存储引擎,因为每个引擎都以不同的方式处理数据和存储数据。对于适用于关系数据库的 InnoDB 引擎,您可能需要阅读MySQL Performance 博客中关于 blob 如何存储在 MySQL 中的这篇文章。
But in abstract, on MySQL 5 and forward the blob is stored as following:
但抽象地说,在 MySQL 5 和转发上,blob 存储如下:
Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them.
Innodb 要么在行页上存储整个 blob,要么只存储 20 字节的 BLOB 指针,优先存储在页面上的较小列,这是合理的,因为您可以存储更多列。
So you are probably thinking now that the right way to go is to store them as separate file, but there are some advantages of using blob to store data, the first one (in my opinion) is the backup. I manage a small server and I had to create another subroutine only to copy my files stored as paths to another storage disk (We couldn't afford to buy a decent tape backup system). If I had designed my application to use blobs a simple mysqldump
would be everything that I needed to backup my whole database.
所以你现在可能在想正确的方法是将它们存储为单独的文件,但是使用 blob 存储数据有一些优点,第一个(在我看来)是备份。我管理一个小型服务器,我不得不创建另一个子例程,只是为了将我作为路径存储的文件复制到另一个存储磁盘(我们买不起像样的磁带备份系统)。如果我将我的应用程序设计为使用 blob,mysqldump
那么备份整个数据库所需的一切都非常简单。
The advantage of storing blobs for backups are better discussed on this postwhere the person who answered had a similar problem than mine.
在这篇文章中更好地讨论了为备份存储 blob 的优势,其中回答问题的人与我的问题相似。
Another advantage is security and the easiness of managing permission and access. All the data inside your MySQL server is password protected and you can easily manage permissions for your users about who access what and who doesn't.
另一个优点是安全性以及管理权限和访问的简便性。MySQL 服务器中的所有数据都受密码保护,您可以轻松管理用户的权限,了解谁访问什么,谁不访问。
In a application which relies on MySQL privileges system for authentication and use. It's certain a plus since it would be a little harder for let's say an invader to retrieve an image (or a binary file like a zipped one) from your disk or an user without access privileges to access it.
在依赖 MySQL 权限系统进行身份验证和使用的应用程序中。这肯定是一个加分项,因为假设入侵者从您的磁盘或没有访问权限的用户检索图像(或压缩文件之类的二进制文件)会有点困难。
So I'd say that
所以我会说
If you gonna manage your MySQL and all the data you have in it and must do regular backups or intend to change or even consider a future change of OS, and have a decent hardware and optimized your MySQL to it, go for BLOB.
如果您要管理 MySQL 及其中的所有数据,并且必须定期备份或打算更改甚至考虑将来更改操作系统,并且拥有不错的硬件并针对它优化了您的 MySQL,请选择 BLOB。
If you will notmanage your MySQL (as in a web host for example) and doesn't intend to change OS or make backups, stick with varchar
columns pointing to your files.
如果您不会管理您的 MySQL(例如在 Web 主机中)并且不打算更改操作系统或进行备份,请坚持使用varchar
指向您的文件的列。
I hope it helped. Cheers
我希望它有所帮助。干杯
回答by mvp
If you store data is BLOB field, you are making it part of your object abstraction.
如果您存储的数据是 BLOB 字段,则您将其作为对象抽象的一部分。
BLOB advantages:
BLOB 优势:
Should you want to remove row with BLOB, or remove it as part of master/slave table relationship or maybe the whole table hierarchy, your BLOB is handled automatically and has same lifetime as any other object in database.
Your scripts do not have a need to access anything but database to get everything they require. In many situations, having direct file access open whole can of worms on how to bypass access or security restrictions. For example, with file access, they may have to mount filesystems which contain actual files. But with BLOB in database, you only have to be able to connect to database, no matter where you are.
If you store it in file and file is replaced, removed or no longer accessible, your database would never know - in effect, you cannot guarantee integrity. Also, it is difficult to reliably support multiple versions when using files. If you use and depend on transactions, it becomes almost impossible.
如果您想使用 BLOB 删除行,或者将其作为主/从表关系的一部分或整个表层次结构删除,您的 BLOB 会自动处理,并且与数据库中的任何其他对象具有相同的生命周期。
您的脚本无需访问任何数据库,即可获取所需的一切。在许多情况下,直接文件访问打开了关于如何绕过访问或安全限制的完整蠕虫。例如,对于文件访问,他们可能必须挂载包含实际文件的文件系统。但是有了数据库中的 BLOB,您只需能够连接到数据库,无论您身在何处。
如果您将其存储在文件中并且文件被替换、删除或不再可访问,您的数据库将永远不会知道 - 实际上,您无法保证完整性。此外,在使用文件时很难可靠地支持多个版本。如果你使用和依赖事务,这几乎是不可能的。
File advantages:
档案优势:
Some databases handle BLOBs rather poorly. For example, while official BLOB limit in MySQL is 4GB, but in reality it is only 1MB in default configuration. You can increase this to 16-32MB by tweaking both client and server configuration to increase MySQL command buffer, but this has a lot of other implications in terms of performance and security.
Even if database does not have some weird size limits, it always will have some overhead in storing BLOB compared to just a file. Also, if BLOB is large, some databases do not provide interface to access blob piece by piece, or
stream
it, which can be large impediment for your workflow.
一些数据库处理 BLOB 的能力相当差。例如,虽然 MySQL 中的官方 BLOB 限制是 4GB,但实际上在默认配置下只有 1MB。您可以通过调整客户端和服务器配置来增加 MySQL 命令缓冲区,将其增加到 16-32MB,但这对性能和安全性有很多其他影响。
即使数据库没有一些奇怪的大小限制,与仅存储文件相比,它在存储 BLOB 方面总是会有一些开销。此外,如果 BLOB 很大,一些数据库不提供逐个访问 blob 的接口,或者
stream
它,这可能会大大阻碍您的工作流程。
In the end, it is up to you. I typically try to keep it in BLOB, unless this creates unreasonable performance problems.
最后,这取决于你。我通常会尝试将其保留在 BLOB 中,除非这会造成不合理的性能问题。
回答by Bill Karwin
Yes, MySQL blobs that don't fit within the same page as a row get stored on overflow pages Note that some blobs are small enough that they're stored with the rest of the row, like any other column. The blob pages are not adjacent to the page their row is stored on, so they may result in extra I/O to read them.
是的,与行不在同一页内的 MySQL Blob 会存储在溢出页上。请注意,某些 Blob 足够小,可以与行的其余部分一起存储,就像任何其他列一样。blob 页面与其所在行所在的页面不相邻,因此它们可能会导致额外的 I/O 读取它们。
On the other hand, just like with any other page type, blob pages can occupy memory in the InnoDB buffer pool, so reading the blobs subsequently is very fast even if they are on separate pages. Files can be cached by the operating system, but typically they're read from disk.
另一方面,就像任何其他页面类型一样,blob 页面可以占用 InnoDB 缓冲池中的内存,因此即使它们位于不同的页面上,后续读取 blob 也非常快。文件可以由操作系统缓存,但通常它们是从磁盘读取的。
Here are a few other factors that may affect your decision:
以下是一些可能会影响您的决定的其他因素:
Blobs are stored logically with a row. This means if you DELETE the row, the associated blob is deleted automatically. But if you store the blob outside the database, you end up with orphaned blob files after you delete rows from the database. You have to do manual steps to find and delete these files.
Blobs stored in the row also follow transaction semantics. For instance, a new blob or an updated blob is invisible to other transactions until you commit. You can also roll back a change. Storing blobs in files outside the database makes this a lot harder.
When you back up a database containing blobs, the database is a lot bigger of course, but when you backup, you get all the data andassociated blobs in one step. If you store blobs externally, you have to back up the database and also back up the filesystem where you store blob files. If you need to ensure that the data and blobs are captured from one instant in time, you pretty much need to use some kind of filesystem snapshots.
If you use replication, the only automatic way of ensuring the blobs get copied to the replication slave automatically is to store blobs in the database.
Blob 以一行逻辑存储。这意味着如果您 DELETE 行,关联的 blob 将自动删除。但是,如果将 blob 存储在数据库之外,则在从数据库中删除行后,最终会得到孤立的 blob 文件。您必须执行手动步骤来查找和删除这些文件。
存储在行中的 Blob 也遵循事务语义。例如,在您提交之前,新的 blob 或更新的 blob 对其他事务是不可见的。您还可以回滚更改。将 blob 存储在数据库外的文件中会使这变得更加困难。
当您备份包含 blob 的数据库时,该数据库当然要大得多,但是当您备份时,只需一步即可获得所有数据和相关 blob。如果在外部存储 blob,则必须备份数据库并备份存储 blob 文件的文件系统。如果您需要确保从一个瞬间捕获数据和 blob,您几乎需要使用某种文件系统快照。
如果使用复制,确保 blob 自动复制到复制从属的唯一自动方法是将 blob 存储在数据库中。
回答by SaidbakR
The better approach is to store your file in the filesystem folder and point to their paths through a varchar field in the database. One of the drawbacks of saving files in the database is slowing it or reducing its performance.
更好的方法是将文件存储在文件系统文件夹中,并通过数据库中的 varchar 字段指向它们的路径。将文件保存在数据库中的缺点之一是减慢速度或降低其性能。
回答by Science_Fiction
Filesystem access will be faster than through the database. Blobs columns have some disadvantages in terms of indexing/sorting etc, which you could do with your filename column if you wished to in the future.
文件系统访问将比通过数据库更快。Blob 列在索引/排序等方面有一些缺点,如果您将来愿意,可以对文件名列进行处理。
The database can also grow quickly with large blobs and then tasks like backing up become slower. I would go with a file location in database with the physical storage on the file system.
数据库也会随着大 blob 快速增长,然后备份等任务会变慢。我会使用数据库中的文件位置和文件系统上的物理存储。