Linux s3fs 将 Amazon S3 存储桶挂载为本地目录的稳定性如何

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10801158/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 06:33:52  来源:igfitidea点击:

How stable is s3fs to mount an Amazon S3 bucket as a local directory

linuxamazon-s3s3fs

提问by arod

How stable is s3fs to mount an Amazon S3 bucket as a local directory in linux? Is it recommended/stable for high demand production environments?

s3fs 将 Amazon S3 存储桶挂载为 linux 中的本地目录有多稳定?对于高需求的生产环境,它是否推荐/稳定?

Are there any better/similar solutions?

有没有更好/类似的解决方案?

Update:Would it be better to use EBS and to mount it via NFS to all other AMIs?

更新:使用 EBS 并通过 NFS 将其挂载到所有其他 AMI 会更好吗?

采纳答案by reach4thelasers

There's a good article on s3fs here, which after reading I resorted to an EBS Share.

这里有一篇关于 s3fs 的好文章,阅读后我求助于 EBS 共享。

It highlights a few important considerations when using s3fs, namely related to the inherent limitations of S3:

它强调了使用 s3fs 时的一些重要注意事项,即与 S3 的固有限制相关的:

  • no file can be over 5GB
  • you can't partially update a file so changing a single byte will re-upload the entire file.
  • operation on many small files are very efficient (each is a separate S3 object after all) but large files are very inefficient
  • Though S3 supports partial/chunked downloads, s3fs doesn't take advantage of this so if you want to read just one byte of a 1GB file, you'll have to download the entire GB.
  • 任何文件都不能超过 5GB
  • 您不能部分更新文件,因此更改单个字节将重新上传整个文件。
  • 对很多小文件的操作非常高效(毕竟每个都是一个单独的 S3 对象)但是大文件效率很低
  • 尽管 S3 支持部分/分块下载,但 s3fs 没有利用这一点,因此如果您只想读取 1GB 文件的一个字节,则必须下载整个 GB。

It therefore depends on what you are storing whether s3fs is a feasible option. If you're storing say, photos, where you want to write an entire file or read an entire file never incrementally change a file, then its fine, although one may ask, if you're doing this, then why not just use S3's API Directly?

因此,s3fs 是否是一个可行的选择取决于您存储的内容。如果您要存储比方说的照片,您想要写入整个文件或读取整个文件的位置永远不会增量更改文件,那么它很好,尽管有人可能会问,如果您正在这样做,那么为什么不使用 S3 的直接API?

If you're talking about appliation data, (say database files, logging files) where you want to make small incremental change then its a definite no - S3 Just doesn't work that way you can't incrementally change a file.

如果您正在谈论应用程序数据(例如数据库文件、日志文件),您希望在其中进行小的增量更改,那么它绝对不可以 - S3 不能那样工作,您无法增量更改文件。

The article mentioned above does talk about a similar application - s3backer- which gets around the performance issues by implementing a virtual filesystem over S3. This gets around the performance issues but itself has a few issues of its own:

上面提到的文章确实讨论了一个类似的应用程序——s3backer——它通过在 S3 上实现虚拟文件系统来解决性能问题。这解决了性能问题,但它本身也有一些问题:

  • High risk for data corruption, due to the delayed writes
  • too small block sizes (e.g., the 4K default) can add significant extra costs (e.g., $130 for 50GB with 4K blocks worth of storage)
  • too large block sizes can add significant data transfer and storage fees.
  • memory usage can be prohibitive: by default it caches 1000 blocks.
    With the default 4K block size that's not an issue but most users
    will probably want to increase block size.
  • 由于延迟写入,数据损坏的高风险
  • 太小的块大小(例如,默认 4K)会增加显着的额外成本(例如,50GB 的 4K 块存储价值为 130 美元)
  • 太大的块大小会增加大量的数据传输和存储费用。
  • 内存使用可能会令人望而却步:默认情况下,它缓存 1000 个块。
    默认的 4K 块大小不是问题,但大多数用户
    可能希望增加块大小。

I resorted to EBS Mounted Drived shared from an EC2 instance. But you should know that although the most performant option it has one big problem An EBS Mounted NFS Share has its own problems - a single point of failure; if the machine that's sharing the EBS Volume goes down then you lose access on all machines which access the share.

我求助于从 EC2 实例共享的 EBS Mounted Drived。但是您应该知道,虽然性能最高的选项有一个大问题 EBS 挂载 NFS 共享有其自身的问题 - 单点故障;如果共享 EBS 卷的机器出现故障,那么您将无法访问所有访问该共享的机器。

This is a risk I was able to live with and was the option I chose in the end. I hope this helps.

这是我能够承受的风险,也是我最终选择的选项。我希望这有帮助。

回答by aleemb

This is an old question so I'll share my experience over the past year with S3FS.

这是一个老问题,所以我将分享我在过去一年中使用 S3FS 的经验。

Initially, it had a number of bugs and memory leaks (I had a cron-job to restart it every 2 hours) but with the latest release 1.73 it's been very stable.

最初,它有许多错误和内存泄漏(我有一个 cron-job 每 2 小时重新启动一次),但在最新的 1.73 版本中它非常稳定。

The best thing about S3FS is you have one less things to worry about and get some performance benefits for free.

S3FS 的最大优点是您无需担心并免费获得一些性能优势。

Most of your S3 requests are going to be PUT (~5%) and GET (~95%). If you don't need any post-processing (thumbnail generation for example). If you don't need any post-processing, you shouldn't be hitting your web server in the first place and uploading directly to S3 (using CORS).

您的大部分 S3 请求将是 PUT (~5%) 和 GET (~95%)。如果您不需要任何后处理(例如生成缩略图)。如果您不需要任何后处理,您不应该首先访问您的 Web 服务器并直接上传到 S3(使用 CORS)。

Assuming you are hitting the server probably means you need to do some post-processing on images. With an S3 API you'll be uploading to the server, then uploading to S3. If the user wants to crop, you'll need to download again from S3, then re-upload to server, crop and then upload to S3. With S3FS and local caching turned on this orchestration is taken care of for you and saves downloading files from S3.

假设您正在访问服务器可能意味着您需要对图像进行一些后期处理。使用 S3 API,您将上传到服务器,然后上传到 S3。如果用户想要裁剪,您需要从S3再次下载,然后重新上传到服务器,裁剪然后上传到S3。启用 S3FS 和本地缓存后,此编排会为您处理并保存从 S3 下载的文件。

On caching, if you are caching to an ephemeral drive on EC2, you get a the performance benefits that come with out and can purge your cache without having to worry about anything. Unless you run out of disk space, you should have no reason to purge your cache. This makes traversing operations like searching and filtering much easier.

在缓存方面,如果您缓存到​​ EC2 上的临时驱动器,您将获得随之而来的性能优势,并且可以清除缓存而无需担心任何事情。除非磁盘空间不足,否则没有理由清除缓存。这使得搜索和过滤等遍历操作变得更加容易。

The one thing I do wish it has was full sync with S3 (RSync style). That would make it an enterprise version of DropBox or Google Drive for S3 but without having to contend with the quotas and fees that come with it.

我希望它拥有的一件事是与 S3(RSync 风格)完全同步。这将使其成为适用于 S3 的 DropBox 或 Google Drive 的企业版,但无需应对随之而来的配额和费用。