Greenplum 与 PostgreSQL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5310994/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-20 22:54:06  来源:igfitidea点击:

Greenplum vs PostgreSQL

sqldatabasedjangopostgresqlgreenplum

提问by 0atman

What are the arguments for and against using Greenpluminstead of PostgreSQLin a webapp (django) environment?

支持和反对在 webapp ( ) 环境中使用Greenplum而不是使用的论据是什么?PostgreSQLdjango

My gut reaction is to prefer PostgreSQL's open-source approach and huge knowledgebase.

我的直觉反应是更喜欢 PostgreSQL 的开源方法和庞大的知识库。

My configuration (though I'd love to hear about any other configuration) is a medium-sized business with 2 web servers and (at the moment) 2 database servers.

我的配置(尽管我很想了解任何其他配置)是一个中型企业,拥有 2 个 Web 服务器和(目前)2 个数据库服务器。

Areas to contrast are binary data crunching, number of nodes in the replicationand my personal favorite: communitiy supportand skilled engineer support.

对比的领域是二进制data crunching、节点数replication和我个人最喜欢的:communitiy support和熟练的工程师支持。

What are the pros and cons of using Greenplum instead of PostgreSQL?

使用 Greenplum 代替 PostgreSQL 的优缺点是什么?

采纳答案by duffymo

I don't know much about Greenplum, except for quickly skimming the link you send. A data warehouse is not the same thing as a transactional operational data store. The former is for ad hoc queries, statistical analysis, dimensional analysis, read-mostly access to historical data. The latter is for real-time, read/write of operational data. They're complimentary.

除了快速浏览您发送的链接之外,我对 Greenplum 了解不多。数据仓库与事务性操作数据存储不同。前者用于即席查询、统计分析、维度分析,主要是读取历史数据。后者用于实时读取/写入操作数据。他们是免费的。

I'm guessing that you want PostgreSQL.

我猜你想要 PostgreSQL。

Who is pushing Greenplum on you and why? If it's being presented as an alternative, I'd dig deeper and rebut the argument.

谁在向您推销 Greenplum,为什么?如果它被作为替代方案提出,我会深入挖掘并反驳这个论点。

回答by Bart K

Greenplum is an MPP adaption of PostgreSQL. It's optimized for warehousing and/or analytics on large sets of data and would not perform that well in a transactional environment. If you need a large DW environment, look at Greenplum. If you need OLTP or smaller DB sizes (under 10TB) then look at PostgreSQL.

Greenplum 是 PostgreSQL 的 MPP 改编版。它针对大型数据集的仓储和/或分析进行了优化,在交易环境中不会表现得那么好。如果您需要大型 DW 环境,请查看 Greenplum。如果您需要 OLTP 或更小的数据库大小(低于 10TB),请查看 PostgreSQL。

回答by 0x0FFF

Greenplum is an MPP analytical (OLAP) DBMS. PostgreSQL is an OLTP DBMS. And in general, there is not a single solution on the market that can be good at both OLAP and OLTP at the same time, you can find my thoughts on it here

Greenplum 是一个 MPP 分析 (OLAP) DBMS。PostgreSQL 是一个 OLTP 数据库管理系统。总的来说,市场上没有一个解决方案可以同时擅长 OLAP 和 OLTP,你可以在这里找到我的想法

The WebApp backend will always create OLTP workload. Greenplum has a big overhead for transaction processing as it is a distributed system, so don't expect this to deliver you more than 500-600 TPS. Postgres in contrast can go to hundreds of thousands of TPS with the right tuning.

WebApp 后端将始终创建 OLTP 工作负载。由于 Greenplum 是分布式系统,因此事务处理的开销很大,所以不要指望它可以为您提供超过 500-600 TPS。相比之下,Postgres 可以通过正确的调整达到数十万 TPS。

In contrast, when you need a OLAP workload, Postgres can offer you only a single host processing, no partitioning with dynamic partition elimination, no compression, no columnar store. While Greenplum would be able to crunch your data in parallel on the cluster.

相比之下,当您需要 OLAP 工作负载时,Postgres 只能为您提供单个主机处理,没有动态分区消除的分区,没有压缩,没有列式存储。而 Greenplum 将能够在集群上并行处理您的数据。

So the solution you are looking for is a typical data warehouse case - use OLTP solution for high transactional workload, extract the data to the DWH with ETL/ELT, and then run complex data crunching queries on it

所以你要找的解决方案是一个典型的数据仓库案例——对于高事务工作负载使用OLTP解决方案,用ETL/ELT将数据提取到DWH,然后在其上运行复杂的数据处理查询

At the moment both PostgreSQL and Greenplum are open source products, so you are free to chose any of them, but of cause PostgreSQL community is bigger ATM

目前 PostgreSQL 和 Greenplum 都是开源产品,所以你可以自由选择其中任何一个,但因为 PostgreSQL 社区是更大的 ATM

回答by David

Since Greenplum utilizes parallel processing, there will be overhead with running lots of tiny read queries as the master node needs to communicate with the underlying data nodes to retrieve an answers to all these queries. For a query taking milliseconds, expect an order of magnitude slower performance for Greenplum.

由于 Greenplum 使用并行处理,运行大量微小读取查询会产生开销,因为主节点需要与底层数据节点通信以检索所有这些查询的答案。对于耗时几毫秒的查询,预计 Greenplum 的性能会降低一个数量级。

回答by LotsOfData

If you are looking for a PostgreSQL-based data warehousing solution, I would also look at GridSQL. It is a parallelization layer over multiple PostgreSQL instances, and is free and open source.

如果您正在寻找基于 PostgreSQL 的数据仓库解决方案,我还会查看 GridSQL。它是多个 PostgreSQL 实例上的并行化层,并且是免费和开源的。

Like mentioned in other comments, it will not perform well for many small millisecond queries, but will help you greatly for long running queries. GridSQL also will not include DW optimizations like columnar storage that Greenplum has, but you can take advantage of constraint exclusion partitioning (ex: subtables by date range) combined with parallelism to get your query results faster.

就像在其他评论中提到的那样,它对于许多小毫秒级的查询表现不佳,但对于长时间运行的查询会大有帮助。GridSQL 也不会包括像 Greenplum 具有的列式存储这样的 DW 优化,但是您可以利用约束排除分区(例如:按日期范围划分的子表)与并行性相结合来更快地获得查询结果。

You can also even use it on a single multi-core server, as PostgreSQL will only use a single core when processing a query.

您甚至可以在单个多核服务器上使用它,因为 PostgreSQL 在处理查询时将只使用一个核。

回答by Mike Sherrill 'Cat Recall'

I think Greenplum takes better advantage of parallel processing. It's based on PostgreSQL, though.

我认为 Greenplum 更好地利用了并行处理。不过,它基于 PostgreSQL。

Greenplum has a free community edition. You can always download and test in your own environment.

Greenplum 有一个免费的社区版。您始终可以在自己的环境中下载和测试。

回答by user677325

If any data crunching takes longer than an hour, you'll get linear performance boosts for every core you add. It's not really worth the effort for anything that takes less time to crunch through.

如果任何数据处理时间超过一个小时,您添加的每个内核都会获得线性性能提升。对于任何需要更少时间来完成的事情来说,付出努力是不值得的。