database 您会将二进制数据存储在数据库中还是文件系统中?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/662488/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:13:50  来源:igfitidea点击:

Would you store binary data in database or in file system?

databasebinary-data

提问by paul

This is a question which has been asked before (large-text-and-images-in-sql) but mainly for data which will be changed. In my case the data will be stored and never changed. Just seems sensible to keep everything together.

这是一个之前已经问过的问题(large-text-and-images-in-sql),但主要是针对将要更改的数据。在我的情况下,数据将被存储并且永远不会改变。把所有东西放在一起似乎是明智的。

Are there any reasons why I should not store static binary data in a database?

有什么原因我不应该在数据库中存储静态二进制数据?

Assuming it is a sensible thing to do, are there any advantages to storing such data in separate tables? (You might begin to realise now that I'm not a DB expert...)

假设这是一件明智的事情,将这些数据存储在单独的表中是否有任何优势?(您现在可能开始意识到我不是 DB 专家...)

Clarify: There will probably be no more than 10-20 users but these will be in the US and in the UK. The binary data will have to be transfered in any case.

澄清:可能不会有超过 10-20 个用户,但这些将在美国和英国。在任何情况下都必须传输二进制数据。

采纳答案by Mehrdad Afshari

The advantage of storing data in the DB is taking advantage of DB security mechanisms and reducing maintanence cost (backups, ...). The disadvantage of it is increasing DB load and consuming connections (which might be expensive for per-connection licensed database servers). If you are using SQL Server 2008, FILESTREAMmight be a nice alternative.

将数据存储在 DB 中的优点是利用 DB 安全机制并降低维护成本(备份,...)。它的缺点是增加数据库负载和消耗连接(这对于每个连接许可的数据库服务器来说可能很昂贵)。如果您使用的是 SQL Server 2008,FILESTREAM可能是一个不错的选择。

By the way, for Web apps (or any other apps that might need streaming the data), it's usually more sensible to store data outside DB.

顺便说一下,对于 Web 应用程序(或任何其他可能需要流式传输数据的应用程序),将数据存储在 DB 之外通常更为明智。

回答by entomo

All this talk about doing a "select * from table" causing huge memory and/or bandwidth issues when the table has a LOB in it is a non-issue. All that is returned is a pointer to the LOB in question. Not enough reputation to put the comment in-context, but people looking at this should know it's NOT an issue.

所有这些都在谈论当表中有 LOB 时执行“select * from table”会导致巨大的内存和/或带宽问题。返回的只是一个指向相关 LOB 的指针。没有足够的声誉将评论置于上下文中,但查看此内容的人应该知道这不是问题。

回答by Vasil

The biggest dissadvantage if you are storing BLOBS is memory consumption. Can you imagine what select * from x would do for thousands of records with a 45k image in each?

如果您存储 BLOBS,最大的缺点是内存消耗。你能想象 select * from x 会为数千条记录做什么,每条记录有 45k 图像吗?

As Mehrdad said there are also advantages. So if you decide to go with that approach you should try to design your database so that most queries return less results with BLOB data in them. Maybe for example make one to one relationships for this purpose.

正如 Mehrdad 所说,也有优势。因此,如果您决定采用这种方法,您应该尝试设计您的数据库,以便大多数查询返回较少的结果,其中包含 BLOB 数据。例如,可能为此目的建立一对一的关系。

回答by Nils Weinander

Addressing the issue from a principles point of view, a relational database is (mainly) there for storing structured data. If you cannot make a query condition or join on a data element it probably doesn't belong in the database. I don't see an image BLOB being used in a WHERE clause, so I'd say keep it outside the database. A CLOB on the other hand can be used in queries.

从原则的角度解决这个问题,关系数据库(主要)用于存储结构化数据。如果您无法创建查询条件或连接数据元素,则它可能不属于数据库。我没有看到 WHERE 子句中使用的图像 BLOB,所以我会说将它保留在数据库之外。另一方面,CLOB 可用于查询。

回答by dkretz

I'm familiar with a fairly good-sized OSS project that made the decision at its inception to store images in the MySQL database, and it's proven to be among the top 3 bad ideas they have been coping with ever since. (Exacerbated by the fact the "refactor mercilessly" is anathema, but that's another story.)

我熟悉一个相当大的 OSS 项目,该项目一开始就决定将图像存储在 MySQL 数据库中,事实证明这是他们一直在应对的前 3 个坏主意之一。(“无情重构”是令人厌恶的,但那是另一回事了。)

Among the serious problems this has caused:

这造成的严重问题包括:

  1. Exceeding maximum efficient database size (mysql). (The total space required for images exceeds all others by a at least 2 orders of magnitude).

  2. Image files lose their "fileness". No dates sizes etc. unless stored (redundantly) as dates (which require code for management).

  3. Arbitrary byte sequences don't process nicely all the time, for either storage or manipulation.

  4. "We'll never need to access the images externally" is a dangerous assumption.

  5. Fragility. Because the whole arrangement is unnatural and touchy, and you don't know where it will bite next (contributing to the anti-refactor mentality).

  1. 超过最大有效数据库大小 (mysql)。(图像所需的总空间超过所有其他空间至少 2 个数量级)。

  2. 图像文件失去了它们的“文件性”。没有日期大小等,除非存储(冗余)为日期(需要代码进行管理)。

  3. 任意字节序列不能一直很好地处理,无论是存储还是操作。

  4. “我们永远不需要从外部访问图像”是一个危险的假设。

  5. 脆弱性。因为整个安排是不自然和敏感的,你不知道它接下来会咬到哪里(导致反重构的心态)。

The benefits? None that I can think of, except it might have been the path of least resistance at the time.

好处?没有我能想到的,除了它当时可能是阻力最小的路径。

回答by JoshBerke

I think this depends on the application your building. If you're building a CMS system, and the usage of the data is going be to display images within a web browser, it might make sense to save the images to disk as opposed to being put into the database. Although honestly I would do both, which could allow adding a server to a farm without having to copy files all over the place.

我认为这取决于您构建的应用程序。如果您正在构建一个 CMS 系统,并且数据的用途是在 Web 浏览器中显示图像,那么将图像保存到磁盘而不是放入数据库中可能更有意义。虽然老实说我会同时做这两个,这可以允许将服务器添加到服务器场,而不必到处复制文件。

Another use case might be a complex object, such as a workflow, or even a business object with lots of interdependancies. You could serialize both of these into a binary or text based format, and save them in the DB. Then you get the benefit of the DB: ATOMIC, Backups, etc...

另一个用例可能是复杂的对象,例如工作流,甚至是具有大量相互依赖性的业务对象。您可以将这两个序列化为二进制或基于文本的格式,并将它们保存在数据库中。然后您将获得 DB 的好处:ATOMIC、备份等...

I don't think people should be using select *queries in the first place. What you do is provide two ways to get the data, One methods returns the summary information, the second would return the blob. I can't imagine why you would need to return thousands of images all at once.

我认为人们不应该首先使用select *查询。您所做的是提供两种获取数据的方法,一种方法返回摘要信息,第二种方法返回 blob。我无法想象为什么您需要一次返回数千张图像。

回答by Ryan Williams

Whoever had the idea of storing an image (or other binary document) in a database is not someone I'm very happy with. Databases are meant for storage of [mostly?] INDEXABLE, DISCRETE data. Not BLOBs of meaningless binary data. If you've worked with BLOBs for binary data first-hand, you already know this.

任何想在数据库中存储图像(或其他二进制文件)的人都不是我很满意的人。数据库用于存储 [主要是?] 可索引的、离散的数据。不是无意义的二进制数据的 BLOB。如果您第一手使用 BLOB 处理二进制数据,那么您已经知道这一点。

You should store a reference to the file in the filesystem. Best practice of which is a filename, not an absolute (or even relative) path.

您应该在文件系统中存储对文件的引用。最佳实践是文件名,而不是绝对(甚至相对)路径。

回答by Walden Leverich

We store attachments in our system, and you cannot change an attachment, so I think we're on the same page w/data that "will be stored and never changed." We specifically decided notto store it in the database. We did this for two reasons, simplicity, and backup/recovery time.

我们将附件存储在我们的系统中,您无法更改附件,所以我认为我们在同一页面上,包含“将被存储且永远不会更改”的数据。我们特别决定将其存储在数据库中。我们这样做有两个原因,简单性和备份/恢复时间。

Simplicity first: In our case these attachments are uploaded from the end-user's browser, and it's simpler to just write them to a directory (on the DB server) than it is to then stream them down the SQL pipe. There is a record of them in the DB, but the DB just contains meta-information about the attachment, and the name of the file on disk (a guid in our case)

简单第一:在我们的例子中,这些附件是从最终用户的浏览器上传的,将它们写入目录(在数据库服务器上)比然后将它们沿 SQL 管道流式传输更简单。数据库中有它们的记录,但数据库仅包含有关附件的元信息和磁盘上文件的名称(在我们的示例中为 guid)

On the backup/recovery side: These blobs will likely become one of the largest pieces of your database. Whenever you run a full backup you'll be copying these bits over and over, even though you know then can never change. To us it just seemed much simpler to have (much) smaller backups, and do an xcopy of the attachment directory to a secondary server as the backup.

在备份/恢复方面:这些 blob 可能会成为数据库中最大的部分之一。每当您运行完整备份时,您都会一遍又一遍地复制这些位,即使您知道永远不会更改。对我们来说,拥有(很多)较小的备份似乎更简单,并将附件目录的 xcopy 复制到辅助服务器作为备份。

回答by JonoW

The performance issue here as been address above, so I won't repeat it. But I think a good tip if you are storing things that will be streamed out a lot (such as images/documents on a web-site) is to build in a caching system.

此处的性能问题已在上面解决,因此我不再赘述。但我认为,如果您要存储将大量流式传输的内容(例如网站上的图像/文档),一个很好的提示是构建缓存系统。

By this I mean store all the data in your database, but when someone requests that file, check if it exists on disk (based on a known filename, in a temp folder), if not, grab it from the DB and write it to the folder, and then stream that to the user. For the next request to the same file, since it exists on disk, it can be served from there without hitting the DB. But if you need to delete these files (or your web-server goes kapput!), it doesn't matter as they will be rebuilt again from the DB as people request them. This should be much quicker than serving each request for the same file from the DB.

我的意思是将所有数据存储在您的数据库中,但是当有人请求该文件时,请检查它是否存在于磁盘上(基于已知文件名,在临时文件夹中),如果不存在,则从数据库中获取并将其写入文件夹,然后将其流式传输给用户。对于对同一文件的下一个请求,由于它存在于磁盘上,因此可以从那里提供服务而无需访问数据库。但是,如果您需要删除这些文件(或者您的网络服务器出现故障!),这并不重要,因为它们会根据人们的请求从数据库中重新构建。这应该比为来自数据库的同一文件的每个请求提供服务要快得多。

回答by Michael Buen

Some database(e.g. Postgresql) automatically compress fields, perhaps it is faster when reading them directly from db. And also, the program can read all the fields and image in one swoop.

某些数据库(例如 Postgresql)会自动压缩字段,也许直接从 db 读取它们会更快。而且,该程序可以一口气读取所有字段和图像。