MySQL 如何创建具有共享表结构的多租户数据库?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2213006/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 15:12:10  来源:igfitidea点击:

How to create a multi-tenant database with shared table structures?

sqlmysqldatabase-designmulti-tenant

提问by Marcel Hymanwerth

Our software currently runs on MySQL. The data of all tenants is stored in the same schema. Since we are using Ruby on Rails we can easily determine which data belongs to which tenant. However there are some companies of course who fear that their data might be compromised, so we are evaluating other solutions.

我们的软件目前在 MySQL 上运行。所有租户的数据都存储在同一个模式中。由于我们使用的是 Ruby on Rails,因此我们可以轻松确定哪些数据属于哪个租户。但是,当然有些公司担心他们的数据可能会受到损害,因此我们正在评估其他解决方案。

So far I have seen three options:

到目前为止,我看到了三个选项:

  • Multi-Database (each tenant gets its own - nearly the same as 1 server per customer)
  • Multi-Schema (not available in MySQL, each tenant gets its own schema in a shared database)
  • Shared Schema (our current approach, maybe with additional identifying record on each column)
  • 多数据库(每个租户都有自己的 - 几乎相当于每个客户 1 个服务器)
  • 多架构(在 MySQL 中不可用,每个租户在共享数据库中都有自己的架构)
  • 共享模式(我们目前的方法,可能在每列上都有额外的识别记录)

Multi-Schema is my favourite (considering costs). However creating a new account and doing migrations seems to be quite painful, because I would have to iterate over all schemas and change their tables/columns/definitions.

多架构是我最喜欢的(考虑到成本)。然而,创建一个新帐户并进行迁移似乎非常痛苦,因为我必须遍历所有模式并更改它们的表/列/定义。

Q:Multi-Schema seems to be designed to have slightly different tables for each tenant - I don't want this. Is there any RDBMS which allows me to use a multi-schema multi-tenant solution, where the table structure is shared between all tenants?

问:Multi-Schema 似乎被设计为每个租户的表略有不同 - 我不想要这样。是否有任何 RDBMS 允许我使用多模式多租户解决方案,其中表结构在所有租户之间共享?

P.S. By multi I mean something like ultra-multi (10.000+ tenants).

PS 多我的意思是像超多(10.000+ 租户)。

采纳答案by Daniel Vassallo

However there are some companies of course who fear that their data might be compromised, so we are evaluating other solutions.

但是,当然有些公司担心他们的数据可能会受到损害,因此我们正在评估其他解决方案。

This is unfortunate, as customers sometimes suffer from a misconception that only physical isolation can offer enough security.

这很不幸,因为客户有时会误以为只有物理隔离才能提供足够的安全性。

There is an interesting MSDN article, titled Multi-Tenant Data Architecture, which you may want to check. This is how the authors addressed the misconception towards the shared approach:

有一篇有趣的 MSDN 文章,标题为Multi-Tenant Data Architecture,您可能需要查看。这就是作者如何解决对共享方法的误解:

A common misconception holds that only physical isolation can provide an appropriate level of security. In fact, data stored using a shared approach can also provide strong data safety, but requires the use of more sophisticated design patterns.

一个常见的误解认为只有物理隔离才能提供适当的安全级别。事实上,使用共享方法存储的数据也可以提供强大的数据安全性,但需要使用更复杂的设计模式。

As for technical and business considerations, the article makes a brief analysis on where a certain approach might be more appropriate than another:

至于技术和业务方面的考虑,本文简要分析了某种方法可能比另一种方法更合适的地方:

The number, nature, and needs of the tenants you expect to serve all affect your data architecture decision in different ways. Some of the following questions may bias you toward a more isolated approach, while others may bias you toward a more shared approach.

  • How many prospective tenants do you expect to target? You may be nowhere near being able to estimate prospective use with authority, but think in terms of orders of magnitude: are you building an application for hundreds of tenants? Thousands? Tens of thousands? More? The larger you expect your tenant base to be, the more likely you will want to consider a more shared approach.

  • How much storage space do you expect the average tenant's data to occupy? If you expect some or all tenants to store very large amounts of data, the separate-database approach is probably best. (Indeed, data storage requirements may force you to adopt a separate-database model anyway. If so, it will be much easier to design the application that way from the beginning than to move to a separate-database approach later on.)

  • How many concurrent end users do you expect the average tenant to support? The larger the number, the more appropriate a more isolated approach will be to meet end-user requirements.

  • Do you expect to offer any per-tenant value-added services, such as per-tenant backup and restore capability? Such services are easier to offer through a more isolated approach.

您期望服务的租户的数量、性质和需求都会以不同的方式影响您的数据架构决策。以下一些问题可能会使您偏向于更孤立的方法,而其他问题可能会使您偏向于更共享的方法。

  • 您希望针对多少潜在租户?您可能无法通过权威来估计预期用途,但要从数量级的角度考虑:您是否正在为数百个租户构建应用程序?几千?成千上万?更多的?您期望租户基础越大,您就越有可能考虑采用更共享的方法。

  • 您预计平均租户的数据会占用多少存储空间?如果您希望部分或所有租户存储大量数据,那么单独的数据库方法可能是最好的。(实际上,数据存储要求可能会迫使您采用单独的数据库模型。如果是这样,那么从一开始就以这种方式设计应用程序比以后转向单独的数据库方法要容易得多。)

  • 您希望平均租户支持多少并发最终用户?数字越大,越孤立的方法越适合满足最终用户的要求。

  • 您是否希望提供任何按租户的增值服务,例如按租户的备份和恢复功能?通过更孤立的方法更容易提供此类服务。



UPDATE:Further to update about the expected number of tenants.

更新:进一步更新有关预期租户数量的信息。

That expected number of tenants (10k) should exclude the multi-database approach, for most, if not all scenarios. I don't think you'll fancy the idea of maintaining 10,000 database instances, and having to create hundreds of new ones every day.

对于大多数情况(如果不是所有情况),预期的租户数量 (10k) 应排除多数据库方法。我认为您不会喜欢维护 10,000 个数据库实例并且每天必须创建数百个新实例的想法。

From that parameter alone, it looks like the shared-database, single-schema approach is the most suitable. The fact that you'll be storing just about 50Mb per tenant, and that there will be no per-tenant add-ons, makes this approach even more appropriate.

仅从该参数来看,共享数据库、单模式方法似乎是最合适的。事实上,您将为每个租户存储大约 50Mb,并且没有每个租户的附加组件,这使得这种方法更加合适。

The MSDN article cited above mentions three security patterns that tackle security considerations for the shared-database approach:

上面引用的 MSDN 文章提到了三种解决共享数据库方法安全注意事项的安全模式:

When you are confident with your application's data safety measures, you would be able to offer your clients a Service Level Agrementthat provides strong data safety guarantees. In your SLA, apart from the guarantees, you could also describe the measures that you would be taking to ensure that data is not compromised.

当您对应用程序的数据安全措施充满信心时,您将能够为您的客户提供服务级别协议,以提供强大的数据安全保证。在您的 SLA 中,除了保证之外,您还可以描述您将采取哪些措施来确保数据不受损害。

UPDATE 2:Apparently the Microsoft guys moved / made a new article regarding this subject, the original link is gone and this is the new one: Multi-tenant SaaS database tenancy patterns(kudos to Shai Kerer)

更新 2:显然微软的人移动/制作了一篇关于这个主题的新文章,原来的链接不见了,这是新的:多租户 SaaS 数据库租赁模式(感谢 Shai Kerer)

回答by AdaTheDev

My experience (albeit SQL Server) is that multi-database is the way to go, where each client has their own database. So although I have no mySQL or Ruby On Rails experience, I'm hoping my input might add some value.

我的经验(尽管是 SQL Server)是多数据库是要走的路,每个客户端都有自己的数据库。因此,尽管我没有 mySQL 或 Ruby On Rails 经验,但我希望我的输入可能会增加一些价值。

The reasons why include :

原因包括:

  1. data security/disaster recovery. Each companies data is stored entirely separately from others giving reduced risk of data being compromised (thinking things like if you introduce a code bug that means something mistakenly looks at other client data when it shouldn't), minimizes potential loss to one client if one particular database gets corrupted etc. The perceived security benefits to the client are even greater (added bonus side effect!)
  2. scalability. Essentially you'd be partitioning your data out to enable greater scalability - e.g. databases can be put on to different disks, you could bring multiple database servers online and move databases around easier to spread the load.
  3. performance tuning. Suppose you have one very large client and one very small. Usage patterns, data volumes etc. can vary wildly. You can tune/optimise easier for each client should you need to.
  1. 数据安全/灾难恢复。每家公司的数据都与其他公司的数据完全分开存储,从而降低了数据被泄露的风险(想想如果你引入了一个代码错误,这意味着某些东西在不应该的时候错误地查看了其他客户数据),最大限度地减少了一个客户的潜在损失,如果有的话特定的数据库被损坏等。对客户端的感知安全优势甚至更大(额外的副作用!)
  2. 可扩展性。从本质上讲,您可以将数据分区以实现更大的可扩展性——例如,数据库可以放在不同的磁盘上,您可以将多个数据库服务器联机并更容易地移动数据库以分散负载。
  3. 性能调优。假设您有一个非常大的客户和一个非常小的客户。使用模式、数据量等可能千差万别。如果需要,您可以更轻松地为每个客户端调整/优化。

I hope this does offer some useful input! There are more reasons, but my mind went blank. If it kicks back in, I'll update :)

我希望这确实提供了一些有用的输入!还有更多的原因,但我的大脑一片空白。如果它重新启动,我会更新:)

EDIT:
Since I posted this answer, it's now clear that we're talking 10,000+ tenants. My experience is in hundreds of large scale databases - I don't think 10,000 separate databases is going to be too manageable for your scenario, so I'm now not favouring the multi-db approach for your scenario. Especially as it's now clear you're talking small data volumes for each tenant!

编辑:
自从我发布了这个答案,现在很明显我们在谈论 10,000 多个租户。我的经验是在数百个大型数据库中 - 我认为 10,000 个单独的数据库对于您的场景来说不会太易于管理,所以我现在不赞成您的场景使用多数据库方法。特别是现在很明显,您正在谈论每个租户的小数据量!

Keeping my answer here as anyway as it may have some use for other people in a similar boat (with fewer tenants)

无论如何,将我的答案保留在这里,因为它可能对类似船上的其他人有用(租户较少)

回答by dana

Below is a link to a white-paper on Salesforce.com about how they implement multi-tenancy:

以下是 Salesforce.com 上有关他们如何实施多租户的白皮书的链接:

http://www.developerforce.com/media/ForcedotcomBookLibrary/Force.com_Multitenancy_WP_101508.pdf

http://www.developerforce.com/media/ForcedotcomBookLibrary/Force.com_Multitenancy_WP_101508.pdf

They have 1 huge table w/ 500 string columns (Value0, Value1, ... Value500). Dates and Numbers are stored as strings in a format such that they can be converted to their native types at the database level. There are meta data tables that define the shape of the data model which can be unique per tenant. There are additional tables for indexing, relationships, unique values etc.

他们有 1 个带有 500 个字符串列的巨大表(Value0、Value1、...Value500)。日期和数字以某种格式存储为字符串,以便它们可以在数据库级别转换为它们的本机类型。有定义数据模型形状的元数据表,每个租户可以是唯一的。还有用于索引、关系、唯一值等的附加表。

Why the hassle?

为什么这么麻烦?

Each tenant can customize their own data schema at run-time without having to make changes at the database level (alter table etc). This is definitely the hard way to do something like this but is very flexible.

每个租户都可以在运行时自定义自己的数据模式,而无需在数据库级别(更改表等)进行更改。这绝对是做这样的事情的艰难方法,但非常灵活。

回答by CraigKerstiens

As you mention the one database per tenant is an option and does have some larger trade-offs with it. It can work well at smaller scale such as a single digit or low 10's of tenants, but beyond that it becomes harder to manage. Both just the migrations but also just in keeping the databases up and running.

正如您提到的每个租户一个数据库是一种选择,并且确实有一些更大的权衡。它可以在较小的规模下运行良好,例如个位数或低 10 的租户,但除此之外,它变得更难管理。不仅是迁移,而且还只是保持数据库正常运行。

The per schema model isn't only useful for unique schemas for each, though still running migrations across all tenants becomes difficult and at 1000's of schemas Postgres can start to have troubles.

每个模式模型不仅对每个模式的独特模式有用,尽管跨所有租户运行迁移仍然变得困难,并且在 1000 个模式下 Postgres 可能开始遇到麻烦。

A more scalable approach is absolutely having tenants randomly distributed, stored in the same database, but across different logical shards (or tables). Depending on your language there are a number of libraries that can help with this. If you're using Rails there is a library to enfore the tenancy acts_as_tenant, it helps ensure your tenant queries only pull back that data. There's also a gem apartment- though it uses the schema model it does help with the migrations across all schemas. If you're using Django there's a number but one of the more popular ones seems to be across schemas. All of these help more at the application level. If you're looking for something more at the database level directly, Citusfocuses on making this type of sharding for multi-tenancywork more out of the box with Postgres.

一种更具可扩展性的方法绝对是让租户随机分布,存储在同一个数据库中,但跨越不同的逻辑分片(或)。根据您的语言,有许多库可以帮助解决这个问题。如果您正在使用 Rails,则有一个库来强制租用acts_as_tenant,它有助于确保您的租户查询仅拉回该数据。还有一个 gem apartment- 尽管它使用模式模型,但它确实有助于跨所有模式的迁移。如果您使用的是 Django,则有一个数字,但其中一个更流行的似乎是跨模式的。所有这些都在应用程序级别提供了更多帮助。如果您直接在数据库级别寻找更多的东西,Citus专注于为使用 Postgres 可以让多租户工作更加开箱即用。