Postgresql 具有多个模式的一个数据库与具有一个模式的多个数据库

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1384909/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 23:51:03  来源:igfitidea点击:

Postgresql one db with multiple schemas vs multiple db with one schema

databasedatabase-designpostgresql

提问by snahor

I've been reading this question, but it doesn't help me. Considering db administration, I think one db with multiple schemas is easier to maintain than the other option, but in terms of performance, which one is better?, is there any advantage from one over the other?

我一直在阅读这个问题,但这对我没有帮助。考虑到数据库管理,我认为一个具有多个模式的数据库比另一个选项更容易维护,但在性能方面,哪个更好?,一个比另一个有什么优势吗?

TIA.

TIA。

回答by derobert

If performance is important, there is no substitute for benchmarking your specific use case. If it isn't that important, then go with easier to administer! Hardware is cheap compared to programmer/DBA time, and compared to the expected higher accident rate on a more complex setup. Not to mention computers follow Moore's law, but the day stubbornly refuses to get longer.

如果性能很重要,则无法替代对您的特定用例进行基准测试。如果它不是那么重要,那么就更容易管理!与程序员/DBA 时间相比,硬件便宜,并且与更复杂的设置中预期的更高事故率相比。更不用说计算机遵循摩尔定律,但这一天顽固地拒绝变得更长。

If I had to guess, having not benchmarked your specific use case (since I can't), I'd guess one database with multiple schemas will be better performing because:

如果我不得不猜测,没有对您的特定用例进行基准测试(因为我不能),我猜测具有多个模式的数据库的性能会更好,因为:

  1. Each connection is to one database, AFAIK. Connection setup/teardown is expensive in PostgreSQL.

  2. Many schemas is closer to many tables than many databases is, and I'd expect many tables to be optimized for more than I'd expect many databases to be.

  1. 每个连接都指向一个数据库 AFAIK。PostgreSQL 中的连接设置/拆卸成本很高。

  2. 许多模式比许多数据库更接近许多表,而且我希望许多表的优化比我预期的许多数据库要多。

However, I can see a counterexample which may apply. Each database is stored in a single directory. This makes splitting databases across filesystems—and thus disk arrays—really easy using normal filesystem tools (e.g., mount points and/or symbolic links). Splitting databases across multiple arrays is very unlikely to outperform one, larger RAID10 array with the same number of disks, but will provide better isolation (database A doing a huge query will not affect database B as much). Do, however, check your OS's IO scheduler documentation; remember that each PostgreSQL connection gets its own backend process, so per-process fairness queuing may accomplish this better.

但是,我可以看到一个可能适用的反例。每个数据库都存储在一个目录中。这使得使用普通文件系统工具(例如挂载点和/或符号链接)跨文件系统拆分数据库(从而跨磁盘阵列)非常容易。跨多个阵列拆分数据库不太可能超过具有相同磁盘数量的一个更大的 RAID10 阵列,但会提供更好的隔离(数据库 A 执行大量查询不会对数据库 B 产生太大影响)。但是,请检查操作系统的 IO 调度程序文档;请记住,每个 PostgreSQL 连接都有自己的后端进程,因此每个进程的公平排队可能会更好地实现这一点。

Note that you can also segment the data across filesystems using PostgreSQL's CREATE TABLESPACEand friends, so the above can actually be done with schemas as well.

请注意,您还可以使用 PostgreSQLCREATE TABLESPACE和朋友跨文件系统对数据进行分段,因此上述实际上也可以使用模式来完成。

回答by Magnus Hagander

Performance-wise, it's going to depend completely on your application.

在性能方面,这将完全取决于您的应用程序。

For example, multiple databases requires connection-pooling per database. If you have hundreds or thousands of databases, that pretty much means you can't do connection pooling. That'll cost you performance for anything except say a client application with a single persistent connection to the database.

例如,多个数据库需要每个数据库的连接池。如果您有数百或数千个数据库,那几乎意味着您无法进行连接池。除了与数据库具有单个持久连接的客户端应用程序之外,这将降低您的性能。

However, if you only access "one database at a time" (and not within seconds of each other either), keeping things in separate databases will only need to load the system table cache for the databases that are actively being used, leaving more memory to cache user tables (since the system tables in each database will be significantly smaller).

但是,如果您只访问“一次一个数据库”(并且彼此之间也不在几秒钟内),那么将事物保存在单独的数据库中将只需要为正在使用的数据库加载系统表缓存,从而留下更多内存缓存用户表(因为每个数据库中的系统表会小很多)。

In most cases, schemas in one database will win out. A fairly common solution is a hybrid, of X databases and Y schemas.

在大多数情况下,一个数据库中的模式会胜出。一个相当常见的解决方案是 X 数据库和 Y 模式的混合。

回答by Jeff Davis

Multiple databases have no performance advantage over multiple schemas (namespaces) that I can see (except perhaps if you have an incredibly large number of tables). CREATE TABLESPACEallows you to put whatever you want wherever you want on the filesystem, so the physical storage can be controlled by you in either case.

与我看到的多个模式(命名空间)相比,多个数据库没有性能优势(除非你有大量的表)。CREATE TABLESPACE允许您将任何您想要的内容放在文件系统上的任何位置,因此无论哪种情况,您都可以控制物理存储。

The main difference will be when you query across multiple schemas, it's much better than querying across multiple databases. Also, you may be able to share more connections via a connection pooler if it's all in the same database.

主要区别在于当您跨多个模式查询时,它比跨多个数据库查询要好得多。此外,如果它们都在同一个数据库中,您可能能够通过连接池共享更多连接。

回答by Martin v. L?wis

I don't think it matters either way. Each table will be stored in a separate file, the only question is what directory the file lives in. Since performance is impacted primarily by read and write operations on individual files, organization of files in directories should have little effect.

我认为这两种方式都不重要。每个表将存储在一个单独的文件中,唯一的问题是文件所在的目录。由于性能主要受对单个文件的读写操作的影响,因此目录中的文件组织应该影响不大。