MySQL Web 应用程序的文件存储:文件系统与数据库与 NoSQL 引擎

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2890452/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 16:06:58  来源:igfitidea点击:

File Storage for Web Applications: Filesystem vs DB vs NoSQL engines

mysqldatabaseweb-applicationsfilenosql

提问by El Yobo

I have a web application that stores a lot of user generated files. Currently these are all stored on the server filesystem, which has several downsides for me.

我有一个存储大量用户生成文件的 Web 应用程序。目前这些都存储在服务器文件系统上,这对我来说有几个缺点。

  • When we move "folders" (as defined by our application) we also have to move the files on disk (although this is more due to strange design decisions on the part of the original developers than a requirement of storing things on the filesystem).
  • It's hard to write tests for file system actions; I have a mock filesystem class that logs actions like move, delete etc, without performing them, which more or less does the job, but I don't have 100% confidence in the tests.
  • I will be adding some other jobs which need to access the files from other service to perform additional tasks (e.g. indexing in Solr, generating thumbnails, movie format conversion), so I need to get at the files remotely. Doing this over network shares seems dodgy...
  • Dealing with permissions on the filesystem as sometimes given us problems in the past, although now that we've moved to a pure Linux environment this should be less of an issue.
  • 当我们移动“文件夹”(由我们的应用程序定义)时,我们还必须移动磁盘上的文件(尽管这更多是由于原始开发人员的奇怪设计决定,而不是要求将内容存储在文件系统上)。
  • 很难为文件系统操作编写测试;我有一个模拟文件系统类,它记录移动、删除等操作,而不执行它们,这或多或少可以完成工作,但我对测试没有 100% 的信心。
  • 我将添加一些其他作业,这些作业需要从其他服务访问文件以执行其他任务(例如在 Solr 中建立索引、生成缩略图、电影格式转换),因此我需要远程获取文件。通过网络共享执行此操作似乎很狡猾......
  • 处理文件系统上的权限在过去有时会给我们带来问题,尽管现在我们已经转移到纯 Linux 环境,这应该不是问题。

So, my main questions are

所以,我的主要问题是

  • What are the downsides of storing files as BLOBs in MySQL?
  • Do the same problems exist with NoSQL systems like Cassandra?
  • Does anyone have any other suggestions that might be appropriate, e.g. MogileFS, etc?
  • 在 MySQL 中将文件存储为 BLOB 的缺点是什么?
  • 像 Cassandra 这样的 NoSQL 系统是否存在同样的问题?
  • 有没有人有任何其他可能合适的建议,例如 MogileFS 等?

采纳答案by Pascal Thivent

Not a direct answer but some pointers to very interesting and somehow similar questions (yeah, they are about blobs and images but this is IMO comparable).

不是直接的答案,而是一些指向非常有趣且以某种方式相似的问题的指针(是的,它们是关于 blob 和图像,但这是 IMO 可比的)。

What are the downsides of storing files as BLOBs in MySQL?

在 MySQL 中将文件存储为 BLOB 的缺点是什么?

Do the same problems exist with NoSQL systems like Cassandra?

像 Cassandra 这样的 NoSQL 系统是否存在同样的问题?

PS: I don't want to be the killjoy but I don't think that any NoSQL solution is going to solve your problem (NoSQL is just irrelevant for most businesses).

PS:我不想成为杀手,但我认为任何 NoSQL 解决方案都不会解决您的问题(NoSQL 与大多数企业无关)。

回答by Randy

maybe a hybrid solution.

也许是混合解决方案。

Use a database to store metadata about each file - and use the file system to actually store the file.

使用数据库来存储每个文件的元数据——并使用文件系统来实际存储文件。

any restructuring of 'folders' could be modelled in the DB and dereferenced from the actual OS location.

任何“文件夹”的重组都可以在数据库中建模,并从实际的操作系统位置取消引用。

回答by jbellis

You can store files up to 2GB easily in Cassandra by splitting them into 1MB columns or so. This is pretty common.

通过将文件拆分为 1MB 左右的列,您可以在 Cassandra 中轻松存储多达 2GB 的文件。这是很常见的。

You could store it as one big column too, but then you'd have to read the whole thing into memory when accessing it.

您也可以将它存储为一个大列,但是在访问它时您必须将整个内容读入内存。

回答by Marcus Adams

If the OS or application doesn't need access to the files, then there's no real need to store the files on the file system. If you want to backup the files at the same time you backup the database, then there's less benefit to storing them outside the database. Therefore, it might be a valid solution to store the files in the database.

如果操作系统或应用程序不需要访问文件,那么就没有真正需要将文件存储在文件系统上。如果您想在备份数据库的同时备份文件,那么将它们存储在数据库之外的好处就更少了。因此,将文件存储在数据库中可能是一个有效的解决方案。

An additional downside is that processing files in the db has more overhead than processing files at the file system level. However, as long as the advantages outweigh the downsides, and it seems that it might in your case, you might give it a try.

另一个缺点是在 db 中处理文件比在文件系统级别处理文件有更多的开销。但是,只要优点大于缺点,并且在您的情况下似乎可能,您就可以尝试一下。

My main concern would be managing disk storage. As your database files get large, managing your entire database gets more complicated. You don't want to move out of the frying pan and into the fire.

我主要关心的是管理磁盘存储。随着数据库文件变大,管理整个数据库变得更加复杂。您不想从煎锅中移出并进入火中。