SQL PostgreSQL 表多大才算太大?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21866113/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 01:06:53  来源:igfitidea点击:

How big is too big for a PostgreSQL table?

sqlruby-on-railsperformancepostgresqldatabase-design

提问by Dylan Karr

I'm working on the design for a RoR project for my company, and our development team has already run into a bit of a debate about the design, specifically the database.

我正在为我的公司设计一个 RoR 项目,我们的开发团队已经遇到了一些关于设计的争论,特别是数据库。

We have a model called Messagethat needs to be persisted. It's a very, very small model with only three db columns other than the id, however there will likely be A LOT of these models when we go to production. We're looking at as much as 1,000,000 insertions per day. The models will only ever be searched by two foreign keys on them which can be indexed. As well, the models never have to be deleted, but we also don't have to keep them once they're about three months old.

我们有一个Message需要持久化的模型。这是一个非常非常小的模型,除了 id 之外只有三个 db 列,但是当我们投入生产时可能会有很多这样的模型。我们每天查看多达 1,000,000 次插入。这些模型只会被两个可以索引的外键搜索。同样,模型永远不必删除,但是一旦它们大约三个月大,我们也不必保留它们。

So, what we're wondering is if implementing this table in Postgres will present a significant performance issue? Does anyone have experience with very large SQL databases to tell us whether or not this will be a problem? And if so, what alternative should we go with?

那么,我们想知道的是,在 Postgres 中实现这个表是否会带来严重的性能问题?有没有人有使用非常大的 SQL 数据库的经验来告诉我们这是否会成为问题?如果是这样,我们应该采用什么替代方案?

采纳答案by Kuberchaun

Rows per a table won't be an issue on it's own.

每个表的行本身不会成为问题。

So roughly speaking 1 million rows a day for 90 days is 90 million rows. I see no reason Postgres can't deal with that, without knowing all the details of what you are doing.

所以粗略地说,每天 100 万行 90 天就是 9000 万行。我认为 Postgres 没有理由不知道你在做什么的所有细节。

Depending on your data distribution you can use a mixture of indexes, filtered indexes, and table partitioning of some kind to speed thing up once you see what performance issues you may or may not have. Your problem will be the same on any other RDMS that I know of. If you only need 3 months worth of data design in a process to prune off the data you don't need any more. That way you will have a consistent volume of data on the table. Your lucky you know how much data will exist, test it for your volume and see what you get. Testing one table with 90 million rows may be as easy as:

根据您的数据分布,您可以混合使用索引、过滤索引和某种表分区,以便在您看到可能存在或不存在的性能问题时加快处理速度。你的问题在我知道的任何其他 RDMS 上都是一样的。如果您在一个过程中只需要 3 个月的数据设计来修剪您不再需要的数据。这样,您将在表上拥有一致的数据量。幸运的是,您知道将存在多少数据,针对您的数量进行测试,看看您会得到什么。测试一张有 9000 万行的表可能很简单:

select x,1 as c2,2 as c3
from generate_series(1,90000000) x;

https://wiki.postgresql.org/wiki/FAQ

https://wiki.postgresql.org/wiki/FAQ

Limit   Value
Maximum Database Size       Unlimited
Maximum Table Size          32 TB
Maximum Row Size            1.6 TB
Maximum Field Size          1 GB
Maximum Rows per Table      Unlimited
Maximum Columns per Table   250 - 1600 depending on column types
Maximum Indexes per Table   Unlimited

回答by James Doherty

Another way to speed up your queries significantly on a table with > 100 million rows is in the off hours cluster the table on the index that is most often used in your queries. We have a table with > 218 million rows and have found 30X improvements.

另一种显着加快对超过 1 亿行的表的查询速度的方法是在非工作时间将表聚集在查询中最常用的索引上。我们有一个超过 2.18 亿行的表,并且发现了 30 倍的改进。

Also, for a very large table, it's a good idea to create an index on your foreign keys.

此外,对于非常大的表,最好在外键上创建索引。