mongodb GridFS 对于生产来说是否足够快速和可靠?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3413115/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 11:46:45  来源:igfitidea点击:

Is GridFS fast and reliable enough for production?

mongodbnginxgridfs

提问by Railsmechanic

I develop a new website and I want to use GridFS as storage for all user uploads, because it offers a lot of advantages compared to a normal filesystem storage.

我开发了一个新网站,我想使用 GridFS 作为所有用户上传的存储,因为与普通文件系统存储相比,它提供了很多优势。

Benchmarks with GridFS served by nginx indicate, that it's not as fast as a normal filesystem served by nginx.

nginx 提供的 GridFS 基准测试表明,它不如 nginx 提供的普通文件系统快。

Benchmark with nginx

使用 nginx 进行基准测试

Is anyone out there, who uses GridFS already in a production environment, or would use it for a new project?

有没有人已经在生产环境中使用 GridFS,或者将其用于新项目?

回答by Manu Eidenberger

I use gridfs at work on one of our servers which is part of a price-comparing website with honorable traffic stats (arround 25k visitors per day). The server hasn't much ram, 2gigs, and even the cpu isn't really fast (Core 2 duo 1.8Ghz) but the server has plenty storage space : 10Tb (sata) in raid 0 configuration. The job the server is doing is very simple:

我在我们的一台服务器上使用 gridfs,该服务器是价格比较网站的一部分,具有可观的流量统计数据(每天大约有 25,000 名访问者)。服务器没有多少内存,2gigs,甚至 cpu 也不是很快(Core 2 duo 1.8Ghz),但服务器有足够的存储空间:raid 0 配置中的 10Tb(sata)。服务器正在做的工作非常简单:

Each product on our price-comparer has an image (there are around 10 million products according to our product db), and the servers job is to download the image, resize it, store it on gridfs, and deliver it to the visitors browser... if it's not present in the grid... or... deliver it to the visitors browser if it's already stored in the grid. So, this could be called as a 'traditional cdn schema'.

我们的价格比较器上的每个产品都有一个图像(根据我们的产品数据库,大约有 1000 万个产品),服务器的工作是下载图像,调整其大小,将其存储在 gridfs 上,并将其交付给访问者浏览器。 .. 如果它不存在于网格中......或者......如果它已经存储在网格中,则将其传递给访问者浏览器。因此,这可以称为“传统 CDN 模式”。

We have stored and processed 4 million images on this server since it's up and running. The resize and store stuff is done by a simple php script... but for sure, a python script, or something like java could be faster.

自该服务器启动并运行以来,我们已在该服务器上存储和处理了 400 万张图像。调整大小和存储内容是由一个简单的 php 脚本完成的……但可以肯定的是,python 脚本或类似 java 的东西可能会更快。

Current data size : 11.23g

当前数据大小:11.23g

Current storage size : 12.5g

当前存储大小:12.5g

Indices : 5

指数:5

Index size : 849.65m

索引尺寸:849.65m

About the reliability : This is very reliable. The server doesn't load, the index size is ok, queries are fast

关于可靠性:这是非常可靠的。服务器不加载,索引大小正常,查询速度快

About the speed : For sure, is it not fast as local file storage, maybe 10% slower, but fast enough to be used in realtime even when the image needs to be processed, which is in our case, very php dependant. Maintenance and development times have also been reduced: it became so simple to delete a single or multiple images : just query the db with a simple delete command. Another interesting thing : when we rebooted our old server, with local file storage (so million of files in thousands of folders), it sometimes hangs for hours cause the system was performing a file integrity check (this really took hours...). We do not have this problem any more with gridfs, our images are now stored in big mongodb chunks (2gb files)

关于速度:当然,它是否不如本地文件存储快,可能慢 10%,但即使需要处理图像也足够实时使用,在我们的例子中,非常依赖 php。维护和开发时间也减少了:删除单个或多个图像变得如此简单:只需使用简单的删除命令查询数据库即可。另一个有趣的事情:当我们重新启动我们的旧服务器时,当我们使用本地文件存储(数千个文件夹中有数百万个文件)时,它有时会挂起几个小时,因为系统正在执行文件完整性检查(这真的花了几个小时......)。我们不再有 gridfs 的这个问题,我们的图像现在存储在大 mongodb 块中(2gb 文件)

So... on my mind... Yes, gridfs is fast and reliable enough to be used for production.

所以......在我看来......是的,gridfs 足够快速和可靠,可以用于生产。

回答by Tom

As mentioned, it might not be as fast as an ordinary filesystem but then it gives you man advantages over ordinary filesystemswhich I think are worth giving up a bit speed for.

如前所述,它可能不如普通文件系统快,但它为您提供了优于普通文件系统的优势,我认为值得放弃一点速度。

Ultimately, with sharding, you might reach a point however where the GridFS storage actually becomes the faster option as opposed to an ordinary filesystem and a single node.

最终,通过分片,您可能会达到一个点,但与普通文件系统和单个节点相比,GridFS 存储实际上成为更快的选择。

回答by schallis

mdirolf's nginx-gridfs module is great and fairly easy to get setup. We're using it in production at paint.lyto serve all of the paintings and there have been no problems so far.

mdirolf 的 nginx-gridfs 模块很棒而且很容易设置。我们在paint.ly 的生产中使用它来为所有画作提供服务,到目前为止没有出现任何问题。

回答by Nick

Heads-up on repairs for larger DBs though - a new system we're developing, mongo didn't cleanly exit, and repairing the 7TB GridFS looks like it will take 130 hrs.

不过,请注意修复更大的数据库——我们正在开发一个新系统,mongo 没有完全退出,修复 7TB GridFS 看起来需要 130 小时。

Because of this, I think I'll look at switching to OpenStack Swift or Ceph. Still, until then it was good. And the nginx-gridfs module is sweet.

因此,我想我会考虑切换到 OpenStack Swift 或 Ceph。不过,在那之前还是不错的。nginx-gridfs 模块很不错。

回答by Vitaly Greck

I don't recommend using gridfs unless you know what you are doing. GridFS is just abstraction layer which splits files for chunks and stores the files in two collections. More files - more overhead. If you expect files be pretty the same size, not exceeding 32M or so - you are in the right way. Do not try to store large files on gridfs. Why?

除非您知道自己在做什么,否则我不建议使用 gridfs。GridFS 只是抽象层,它将文件拆分为块并将文件存储在两个集合中。更多文件 - 更多开销。如果您希望文件大小相同,不超过 32M 左右 - 您的方法是正确的。不要尝试在 gridfs 上存储大文件。为什么?

  1. Drivers on different languages may read the whole file.(e.g. chunks) when reading the little part of the file.
  2. Modifying the file may affect all chunks and increase database load If your file system is growing up, you will have to decide to shard the gridfs. Be careful! Consistence is not guaranteed when sharding is initializing!
  1. 当读取文件的一小部分时,不同语言的驱动程序可能会读取整个文件(例如块)。
  2. 修改文件可能会影响所有块并增加数据库负载如果您的文件系统正在增长,您将不得不决定对 gridfs 进行分片。当心!分片初始化时不能保证一致性!

If you think about read loaded project - consider loading the files into docs directly (if 16M or less size) or choose another clusterfs, and link filename/inode to your logic.

如果您考虑读取加载的项目 - 考虑将文件直接加载到文档中(如果大小为 16M 或更小)或选择另一个 clusterfs,并将文件名/inode 链接到您的逻辑。

Hope this helps.

希望这可以帮助。