postgresql 流复制和逻辑复制的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33621906/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between Stream Replication and logical replication
提问by Simon Su
Could anybody tell me more about difference between physical replication and logical replication in PostgreSQL?
有人能告诉我更多关于 PostgreSQL 中物理复制和逻辑复制之间的区别吗?
回答by Craig Ringer
TL;DR: Logical replication sends row-by-row changes, physical replication sends disk block changes. Logical replication is better for some tasks, physical replication for others.
TL;DR:逻辑复制发送逐行更改,物理复制发送磁盘块更改。逻辑复制更适合某些任务,物理复制更适合其他任务。
As of 9.5 logical replication is fairly immature. Use physical replication if you are asking this question.
从 9.5 开始,逻辑复制相当不成熟。如果您问这个问题,请使用物理复制。
Streaming replication can belogical replication. It's all a bit complicated.
流式复制可以是逻辑复制。这一切都有些复杂。
WAL-shipping vs streaming
WAL 运输与流媒体
There are two main ways to send data from master to replica in PostgreSQL:
在 PostgreSQL 中有两种主要的方式将数据从 master 发送到 replica:
WAL-shippingor continuous archiving, where individual write-ahead-log files are copied from
pg_xlog
by thearchive_command
running on the master to some other location. Arestore_command
configured in the replica'srecovery.conf
runs on the replica to fetch the archives so the replica can replay the WAL.This is what's used for point-in-time replication(PITR), which is used as a method of continuous backup.
No direct network connection is required to the master server. Replication can have long delays, especially without an
archive_timeout
set. WAL shipping cannot be used for synchronous replication.streaming replication, where each change is sent to one or more replica servers directly over a TCP/IP connection as it happens. The replicas must have a direct network connection the master configured in their
recovery.conf
'sprimary_conninfo
option.Streaming replication has little or no delay so long as the replica is fast enough to keep up. It can be used for synchronous replication. You cannot use streaming replication for PITR1so it's not much use for continuous backup. If you drop a table on the master, oops, it's dropped on the replicas too.
WAL-shipping或连续归档,其中单独的预写日志文件
pg_xlog
由archive_command
主服务器上的运行复制到其他位置。restore_command
副本中配置的A在副本上recovery.conf
运行以获取档案,以便副本可以重放 WAL。这是用于时间点复制(PITR) 的内容,用作连续备份的方法。
不需要到主服务器的直接网络连接。复制可能会有很长的延迟,尤其是在没有
archive_timeout
集合的情况下。WAL 传送不能用于同步复制。流式复制,其中每个更改都会在发生时直接通过 TCP/IP 连接发送到一个或多个副本服务器。副本必须有一个直接的网络连接,主节点在它们
recovery.conf
的primary_conninfo
选项中配置。只要副本足够快以跟上,流式复制几乎没有延迟。它可用于同步复制。您不能对 PITR 1使用流复制,因此它对连续备份没有多大用处。如果你在主服务器上删除一个表,糟糕,它也会被删除到副本上。
Thus, the two methods have different purposes.
因此,这两种方法具有不同的目的。
Asynchronous vs synchronous streaming
异步与同步流
On top of that, there's synchronousand asynchronousstreaming replication:
最重要的是,还有同步和异步流复制:
In asynchronous streaming replicationthe replica(s) are allowed to fall behind the master in time when the master is faster/busier. If the master crashes you might lose data that wasn't replicated yet.
If the asynchronous replica falls too far behind the master, the master might throw away information the replica needs if
wal_keep_segments
is too low and no slot is used, meaning you have to re-create the replica from scratch. Or the master'spg_xlog
might fill up and stop the master from working until disk space is freed ifwal_keep_segments
is too high or a slot is used.In synchronous replicationthe master doesn't finish committing until a replica has confirmed it received the transaction2. You never lose data if the master crashes and you have to fail over to a replica. The master will never throw away data the replica needs or fill up its xlog and run out of disk space because of replica delays. In exchange it can cause the master to slow down or even stop working if replicas have problems, and it always has some performance impact on the master due to network latency.
When there are multiple replicas, only one is synchronous at a time. See
synchronous_standby_names
.
在异步流复制中,当 master 更快/更忙时,允许副本及时落后于 master。如果 master 崩溃,您可能会丢失尚未复制的数据。
如果异步副本落后于主副本太远,主副本可能会丢弃副本所需的信息,如果
wal_keep_segments
太低并且没有使用插槽,这意味着您必须从头开始重新创建副本。或者,pg_xlog
如果磁盘空间wal_keep_segments
太高或使用了插槽,则主机可能会填满并停止主机工作,直到释放磁盘空间。在同步复制中,在副本确认收到事务2之前,主节点不会完成提交。如果 master 崩溃并且您必须故障转移到副本,您永远不会丢失数据。主节点永远不会因为副本延迟而丢弃副本所需的数据或填满其 xlog 并耗尽磁盘空间。作为交换,如果副本出现问题,它会导致主节点变慢甚至停止工作,并且由于网络延迟,它总是对主节点产生一些性能影响。
当有多个副本时,一次只有一个是同步的。见
synchronous_standby_names
。
You can't have synchronous log shipping.
您不能进行同步日志传送。
You can actually combine log shipping and asynchronous replication to protect against having to recreate a replica if it falls too far behind, without risking affecting the master. This is an ideal configuration for many deployments, combined with monitoring how far the replica is behind the master to ensure it's within acceptable disaster recovery limits.
您实际上可以将日志传送和异步复制结合起来,以防止在落后太多时必须重新创建副本,而不会影响主服务器。这是许多部署的理想配置,结合监控副本在主服务器后面的距离以确保其在可接受的灾难恢复限制范围内。
Logical vs physical
逻辑与物理
On top of thatwe have logicalvs physicalstreaming replication, as introduced in PostgreSQL 9.4:
最重要的是,我们有逻辑与物理流复制,如 PostgreSQL 9.4 中所介绍的:
In physical streaming replicationchanges are sent at nearly disk block level, like "at offset 14 of disk page 18 of relation 12311, wrote tuple with hex value 0x2342beef1222....".
Physical replication sends everything: the contents of every database in the PostgreSQL install, all tables in every database. It sends index entries, it sends the whole new table data when you
VACUUM FULL
, it sends data for transactions that rolled back, etc. So it generates a lot of "noise" and sends a lot of excess data. It also requires the replica to be completely identical, so you cannot do anything that'd require a transaction, like creating temp or unlogged tables. Querying the replica delays replication, so long queries on the replica need to be cancelled.In exchange, it's simple and efficient to apply the changes on the replica, and the replica is reliably exactly the same as the master. DDL is replicated transparently, just like everything else, so it requires no special handling. It can also stream big transactions as they happen, so there is little delay between commit on the master and commit on the replica even for big changes.
Physical replication is mature, well tested, and widely adopted.
logical streaming replication, new in 9.4, sends changes at a higher level, and much more selectively.
It replicates only one database at a time. It sends only row changes and only for committed transactions, and it doesn't have to send vacuum data, index changes, etc. It can selectively send data only for some tables within a database. This makes logical replication muchmore bandwidth-efficient.
Operating at a higher level also means that you can do transactions on the replica databases. You can create temporary and unlogged tables. Even normal tables, if you want. You can use foreign data wrappers, views, create functions, whatever you like. There's no need to cancel queries if they run too long either.
Logical replication can also be used to build multi-master replication in PostgreSQL, which is not possible using physical replication.
In exchange, though, it can't (currently) stream big transactions as they happen. It has to wait until they commit. So there can be a long delay between a big transaction committing on the master and being applied to the replica.
It replays transactions strictly in commit order, so small fast transactions can get stuck behind a big transaction and be delayed quite a while.
DDL isn't handled automatically. You have to keep the table definitions in sync between master and replica yourself, or the application using logical replication has to have its own facilities to do this. It can be complicated to get this right.
The apply process its self is more complicated than "write some bytes where I'm told to" as well. It also takes more resources on the replica than physical replication does.
Current logical replication implementations are not mature or widely adopted, or particularly easy to use.
在物理流复制中,更改几乎是在磁盘块级别发送的,例如“在关系 12311 的磁盘页 18 的偏移量 14 处,写入了十六进制值为 0x2342beef1222 的元组......”。
物理复制发送一切:PostgreSQL 安装中每个数据库的内容,每个数据库中的所有表。它发送索引条目,发送整个新表数据
VACUUM FULL
,发送回滚事务的数据等。因此它会产生大量“噪音”并发送大量多余数据。它还要求副本完全相同,因此您不能执行任何需要事务的操作,例如创建临时表或未记录的表。查询副本会延迟复制,因此需要取消对副本的长时间查询。作为交换,在副本上应用更改既简单又高效,并且副本与主副本可靠地完全相同。DDL 是透明复制的,就像其他一切一样,因此不需要特殊处理。它还可以在大事务发生时流式传输,因此即使发生大的更改,在主服务器上提交和在副本上提交之间也几乎没有延迟。
物理复制已经成熟、经过充分测试并被广泛采用。
逻辑流复制是 9.4 中的新增功能,可以在更高级别和更有选择性地发送更改。
它一次只复制一个数据库。它只发送行更改,并且只发送已提交的事务,而不必发送真空数据、索引更改等。它可以选择性地只发送数据库中某些表的数据。这使得逻辑复制的带宽效率更高。
在更高级别上操作还意味着您可以在副本数据库上执行事务。您可以创建临时表和未记录的表。如果你愿意,即使是普通的桌子。您可以使用外部数据包装器、视图、创建函数,无论您喜欢什么。如果查询运行时间过长,也无需取消查询。
PostgreSQL 中也可以使用逻辑复制来构建多主复制,这是使用物理复制无法实现的。
但是,作为交换,它(目前)不能在发生大交易时进行流式传输。它必须等到他们提交。因此,在 master 上提交的大事务和应用于副本之间可能会有很长的延迟。
它严格按照提交顺序重放事务,因此小的快速事务可能会卡在大事务后面并延迟很长时间。
DDL 不会自动处理。您必须自己保持主表和副本之间的表定义同步,或者使用逻辑复制的应用程序必须有自己的工具来做到这一点。要做到这一点可能很复杂。
应用过程本身也比“在我被告知的地方写一些字节”更复杂。与物理复制相比,它在副本上占用的资源也更多。
当前的逻辑复制实现并不成熟或广泛采用,或者特别容易使用。
Too many options, tell me what to do
选择太多,告诉我该怎么做
Phew. Complicated, huh? And I haven't even got into the details of delayed replication, slots, wal_keep_segments
, timelines, how promotion works, Postgres-XL, BDR and multimaster, etc.
呼。很复杂吧?我什至没有深入了解延迟复制、插槽wal_keep_segments
、时间线、促销如何工作、Postgres-XL、BDR 和多主等的细节。
So what should you do?
那么你应该怎么做?
There's no single right answer. Otherwise PostgreSQL would only support that one way. But there are a few common use cases:
没有唯一的正确答案。否则 PostgreSQL 将只支持这种方式。但是有一些常见的用例:
For backup and disaster recoveryuse pgbarman
to make base backups and retain WAL for you, providing easy to manage continuous backup. You should still take periodic pg_dump
backups as extra insurance.
用于备份和灾难恢复,用于pgbarman
制作基础备份并为您保留 WAL,提供易于管理的连续备份。您仍然应该将定期pg_dump
备份作为额外的保险。
For high availability with zero data loss riskuse streaming synchronous replication.
为了实现零数据丢失风险的高可用性,请使用流式同步复制。
For high availability with low data loss risk and better performanceyou should use asynchronous streaming replication. Either have WAL archiving enabled for fallback or use a replication slot. Monitor how far the replica is behind the master using external tools like Icinga.
为了获得具有低数据丢失风险和更好性能的高可用性,您应该使用异步流复制。要么为回退启用 WAL 归档,要么使用复制槽。使用 Icinga 等外部工具监控副本落后于主副本的距离。