MongoDB 中多租户数据库的推荐方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2748825/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 11:43:49  来源:igfitidea点击:

What is the recommended approach towards multi-tenant databases in MongoDB?

mongodbmulti-tenant

提问by Braintapper

I'm thinking of creating a multi-tenant app using MongoDB. I don't have any guesses in terms of how many tenants I'd have yet, but I would like to be able to scale into the thousands.

我正在考虑使用 MongoDB 创建一个多租户应用程序。我对我还有多少租户没有任何猜测,但我希望能够扩展到数千个。

I can think of three strategies:

我可以想到三个策略:

  1. All tenants in the same collection, using tenant-specific fields for security
  2. 1 Collection per tenant in a single shared DB
  3. 1 Database per tenant
  1. 同一个集合中的所有租户,使用租户特定的字段来确保安全
  2. 单个共享数据库中每个租户 1 个集合
  3. 每个租户 1 个数据库

The voice in my head is suggesting that I go with option 2.

我脑子里的声音建议我选择选项 2。

Thoughts and implications, anyone?

想法和影响,有人吗?

采纳答案by Ruslan Kiskinov

I have the same problem to solve and also considering variants. As I have years of experience creating SaaS multi-tenant applicatios I also was going to select the second option based on my previous experience with the relational databases.

我有同样的问题要解决,也在考虑变体。由于我在创建 SaaS 多租户应用程序方面有多年的经验,因此我还将根据我以前在关系数据库方面的经验选择第二个选项。

While making my research I found this article on mongodb support site (way back added since it's gone): https://web.archive.org/web/20140812091703/http://support.mongohq.com/use-cases/multi-tenant.html

在进行研究时,我在 mongodb 支持站点上发现了这篇文章(因为它已经消失了,所以后来添加了):https: //web.archive.org/web/20140812091703/http: //support.mongohq.com/use-cases/multi -tenant.html

The guys stated to avoid 2nd options at any cost, which as I understand is not particularly specific to mongodb. My impression is that this is applicable for most of the NoSQL dbs I researched (CoachDB, Cassandra, CouchBase Server, etc.) due to the specifics of the database design.

这些人表示要不惜一切代价避免第二个选项,据我所知,这并不是特别针对 mongodb。我的印象是,由于数据库设计的特殊性,这适用于我研究的大多数 NoSQL 数据库(CoachDB、Cassandra、CouchBase Server 等)。

Collections (or buckets or however they call it in different DBs) are not the same thing as security schemas in RDBMS despite they behave as container for documents they are useless for applying good tenant separation. I couldn't find NoSQL database that can apply security restrictions based on collections.

集合(或桶或它们在不同数据库中的称呼)与 RDBMS 中的安全模式不同,尽管它们充当文档的容器,但它们对于应用良好的租户分离毫无用处。我找不到可以基于集合应用安全限制的 NoSQL 数据库。

Of course you can use mongodb role based security to restrict the access on database/server level. (http://docs.mongodb.org/manual/core/authorization/)

当然,您可以使用基于 mongodb 角色的安全性来限制数据库/服务器级别的访问。( http://docs.mongodb.org/manual/core/authorization/)

I would recommend 1st option when:

在以下情况下,我会推荐第一个选项:

  • You have enough time and resources to deal with the complexity of the design, implementation and testing of this scenario.
  • If you are not going to have much differences in structure and functionality in the database for different tenants.
  • Your application design will allow tenants to make only minimal customizations at runtime.
  • If you want to optimize space and minimize usage of hardware resources.
  • If you are going to have thousands of tenants.
  • If you want to scale out fast and at good cost.
  • If you are NOT going to backup data based on tenants (keep separate backups for each tenant). It is possible to do that even in this scenario but the effort will be huge.
  • 您有足够的时间和资源来处理此场景的设计、实现和测试的复杂性。
  • 如果您不打算在不同租户的数据库中在结构和功能上有太大差异。
  • 您的应用程序设计将允许租户在运行时仅进行最少的自定义。
  • 如果您想优化空间并尽量减少硬件资源的使用。
  • 如果你有成千上万的租户。
  • 如果您想以合理的成本快速扩展。
  • 如果您不打算根据租户备份数据(为每个租户保留单独的备份)。即使在这种情况下也可以做到这一点,但付出的努力将是巨大的。

I would go for variant 3 if:

如果出现以下情况,我会选择变体 3:

  • You are going to have small list of tenants (several hundred).
  • The specifics of the business requires you to be able to support big differences in the database structure for different tenants (e.g. integration with 3rd-party systems, import-export of data).
  • Your application design will allow customers (tenants) to make significant changes in the application runtime (adding modules, customizing the fields etc.).
  • If you have enough resources to scale out with new hardware nodes quickly.
  • If you are required to keep versions/backups of data per tenant. Also the restore will be easy.
  • There are legal/regulatory restrictions that forces you to keep different tenants in different databases (even data centers).
  • If you want to fully utilize the out-of-the-box security features of mongodb such as roles.
  • There are big differences in matter of size between tenants (you have many small tenants and few very large tenants).
  • 您将拥有一小部分租户(数百个)。
  • 业务的具体情况要求您能够支持不同租户的数据库结构的巨大差异(例如与第三方系统的集成、数据的导入导出)。
  • 您的应用程序设计将允许客户(租户)在应用程序运行时进行重大更改(添加模块、自定义字段等)。
  • 如果您有足够的资源快速扩展新的硬件节点。
  • 如果您需要为每个租户保留数据的版本/备份。恢复也很容易。
  • 法律/监管限制迫使您将不同的租户保留在不同的数据库(甚至数据中心)中。
  • 如果你想充分利用 mongodb 的开箱即用的安全特性,比如角色。
  • 租户之间的规模问题存在很大差异(您有许多小租户,而很少有非常大的租户)。

If you post additional details about your application, perhaps I can give you more detailed advice.

如果您发布有关您的应用程序的其他详细信息,也许我可以为您提供更详细的建议。

回答by Braintapper

I found a good answer in the comments in this link:

我在此链接的评论中找到了一个很好的答案:

http://blog.boxedice.com/2010/02/28/notes-from-a-production-mongodb-deployment/

http://blog.boxedice.com/2010/02/28/notes-from-a-production-mongodb-deployment/

Basically option #2 seems to be the best way to go.

基本上选项#2 似乎是最好的方法。

Quote from David Mytton's comment:

引用 David Mytton 的评论:

We decided not to have a database per customer because of the way MongoDB allocates its data files. Each database uses it's own set of files:

The first file for a database is dbname.0, then dbname.1, etc. dbname.0 will be 64MB, dbname.1 128MB, etc., up to 2GB. Once the files reach 2GB in size, each successive file is also 2GB.

Thus if the last datafile present is say, 1GB, that file might be 90% empty if it was recently reached.

from the manual.

As users sign up to the trial and give things a go, we'd get more and more databases that were at least 2GB in size, even if the whole of the data file wasn't use. We found this used a massive amount of disk space compared to having several databases for all customers where the disk space can be used to maximum efficiency.

Sharding will be on a per collection basis as standard which presents a problem where the collection never reaches the minimum size to start sharding, as is the case for quite a few of ours (e.g. collections just storing user login details). However, we have requested that this will also be able to be done on a per database level. See http://jira.mongodb.org/browse/SHARDING-41

There are no performance tradeoffs using lots of collections. See http://www.mongodb.org/display/DOCS/Using+a+Large+Number+of+Collections

由于 MongoDB 分配其数据文件的方式,我们决定不对每个客户拥有一个数据库。每个数据库使用它自己的一组文件:

数据库的第一个文件是 dbname.0,然后是 dbname.1 等。dbname.0 将是 64MB,dbname.1 128MB 等,最多 2GB。一旦文件大小达到 2GB,每个后续文件也为 2GB。

因此,如果存在的最后一个数据文件是 1GB,那么如果最近到达该文件,则该文件可能有 90% 是空的。

从手册。

当用户注册试用并尝试使用时,我们会得到越来越多的至少 2GB 大小的数据库,即使整个数据文件都没有使用。我们发现这使用了大量的磁盘空间,而不是为所有客户提供多个数据库,在这些数据库中可以最大限度地利用磁盘空间。

分片将在每个集合的基础上作为标准进行,这会带来一个问题,即集合永远不会达到开始分片的最小大小,就像我们很多人的情况一样(例如,仅存储用户登录详细信息的集合)。但是,我们已经要求这也能够在每个数据库级别上完成。见 http://jira.mongodb.org/browse/SHADING-41

使用大量集合没有性能权衡。见 http://www.mongodb.org/display/DOCS/Using+a+Large+Number+of+Collections

回答by TTT

I would go for option 2.

我会选择选项 2。

However you could set mongod.exe command line option --smallfiles. This means that the biggest file size of an extent will be 0.5 gigabyte and not 2 gigabyte. I tested this with mongo 1.42 . So option 3 is not impossible.

但是,您可以设置 mongod.exe 命令行选项 --smallfiles。这意味着一个范围的最大文件大小将是 0.5 GB 而不是 2 GB。我用 mongo 1.42 对此进行了测试。所以选项 3 并非不可能。

回答by AJ.

There is a reasonable article on MSDN about multi-tenant data architecturewhich you might wish to refer to. Some key topics touched on by this article:

MSDN 上一篇关于多租户数据架构的合理文章,您可能希望参考。本文涉及的一些关键主题:

  • Economic considerations
  • Security
  • Tenant considerations
  • Regulatory (legal)
  • Skill set concerns
  • 经济考虑
  • 安全
  • 租户注意事项
  • 监管(法律)
  • 技能集问题

Also touched upon are some patterns for Software as a Service (SaaS) configuration.

还涉及软件即服务 (SaaS) 配置的一些模式。

Additionally, worth a gander is an interesting write-up from the SQL Anywhere guys.

此外,值得一看的是SQL Anywhere 人员的一篇有趣的文章

My own personal take - unless you are certain of enforced security / trust, I would go with option 3, or if scalability concerns prohibit fallback to option 2 at a minimum. That said... I'm no pro with MongoDB. I get pretty nervous using a shared "schema" - but I will happily defer to more experienced practitioners.

我个人的看法 - 除非您确定强制执行安全/信任,否则我会选择选项 3,或者如果可扩展性问题至少禁止回退到选项 2。也就是说......我不是MongoDB的专业人士。使用共享的“模式”时,我会感到非常紧张——但我很乐意听从更有经验的从业者的意见。

回答by Osleynin Mambell Ramos

According to my research in MongoDB. Trucos y consejos. Aplicaciones multitenant.that option is not recommended if you do not know how many tenants you can have, it could be thousands and it would be complicated when it comes to sharding, also imagine having thousands of collections in a single database ... So in your case it is recommended to use option one. Now if you are going to have a limited number of users, it is already different and yes, you could use option two as you thought.

根据我对MongoDB 的研究Trucos y consejos。应用程序多租户。如果您不知道可以拥有多少租户,则不建议使用该选项,可能有数千个租户,并且在分片方面会很复杂,还可以想象在单个数据库中有数千个集合......所以在您的情况下建议使用选项一。现在,如果您的用户数量有限,那已经不同了,是的,您可以按照您的想法使用选项二。

回答by Sumedh

While the discussion here is on NoSQL and primarily MongoDB, we at Citusare using PostgreSQL and building a distributed/sharded multi-tenant database.

虽然这里的讨论是关于 NoSQL 并且主要是 MongoDB,但我们Citus正在使用 PostgreSQL 并构建分布式/分片多租户数据库。

Our use-case guidewalks through an example app, covering the schema and various multi-tenant specific features.

我们的用例指南介绍了一个示例应用程序,涵盖了架构和各种多租户特定功能。

For more unstructured data, we use PostgreSQL's JSONB column to store such and tenant-specific data.

对于更多非结构化数据,我们使用 PostgreSQL 的 JSONB 列来存储此类和特定于租户的数据。