如何检查 PostgreSQL 中的复制延迟?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28323355/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to check the replication delay in PostgreSQL?
提问by Alf162
I would like to measure time between insert data into master-table and slave-table using streaming replication in PostgreSQL 9.3. For this I create table test_timewith 2 fields id(serial), t(text). After that added a trigger:
我想使用 PostgreSQL 9.3 中的流复制来测量将数据插入主表和从表之间的时间。为此,我创建了包含 2 个字段 id(serial)、t(text) 的表test_time。之后添加了一个触发器:
cur_time:=to_char(current_timestamp, 'HH12:MI:SS:MS:US');
update test_time set t=cur_time where id=new.id;
cur_time:=to_char(current_timestamp, 'HH12:MI:SS:MS:US');
update test_time set t=cur_time where id=new.id;
But the time is the same in both tables. How can I measure delay time
但是两个表中的时间相同。如何测量延迟时间
采纳答案by Craig Ringer
You can get the delay in bytesfrom the master side quite easily using pg_xlog_location_diff
to compare the master's pg_current_xlog_insert_location
with the replay_location
for that backend's pg_stat_replication
entry.
你可以得到延迟以字节为单位很容易地使用来自主面pg_xlog_location_diff
比较主人的pg_current_xlog_insert_location
与replay_location
该后端的pg_stat_replication
条目。
This only works when run on the master. You can't do it from the replica because the replica has no idea how far ahead the master is.
这仅在主服务器上运行时有效。您无法从副本中执行此操作,因为副本不知道主服务器领先多远。
Additionally this won't tell you the lag in seconds. In current (as of 9.4 at least) PostgreSQL versions there's no timestamp associated with a commit or a WAL record. So there's no way to tell how long ago a given LSN (xlog position) was.
此外,这不会告诉您以秒为单位的延迟。在当前(至少从 9.4 开始)PostgreSQL 版本中,没有与提交或 WAL 记录相关联的时间戳。所以没有办法知道给定的 LSN(xlog 位置)是多久以前的。
The only way to get the replica lag in seconds on a current PostgreSQL version is to have an external process commit an update
to a dedicated timestamp table periodically. So you can compare current_timestamp
on the replica to the timestamp of the most recent entry in that table visible on the replica to see how far the replica is behind. This creates additional WAL traffic that will then have to be kept in your archived WAL for PITR (PgBarman or whatever), so you should balance the increased data use with the granularity of lag detection you require.
在当前 PostgreSQL 版本上以秒为单位获得副本延迟的唯一方法是让外部进程update
定期向专用时间戳表提交。因此,您可以将current_timestamp
副本与副本上可见的该表中最新条目的时间戳进行比较,以查看副本落后多远。这会创建额外的 WAL 流量,然后必须将这些流量保存在您的存档 WAL 中以用于 PITR(PgBarman 或其他),因此您应该平衡增加的数据使用与您需要的滞后检测粒度。
PostgreSQL 9.5 may add commit timestamps that will hopefully let you find out how long ago a given commit happened and therefore how far a replica is behind in wall-clock seconds.
PostgreSQL 9.5 可能会添加提交时间戳,希望可以让您了解给定提交发生的时间,以及副本在挂钟秒内落后多远。
回答by dbenhur
Alf162 mentioned a good solution in the comments to Craig Ringer's answer; so I'm adding this to clarify.
Alf162 在对 Craig Ringer 的回答的评论中提到了一个很好的解决方案;所以我添加这个来澄清。
PostgreSQL has an administrative function pg_last_xact_replay_timestamp()which returns time stamp of the last transaction replayed during recovery. This is the time at which the commit or abort WAL record for that transaction was generated on the primary.
PostgreSQL 有一个管理函数pg_last_xact_replay_timestamp(),它返回恢复期间重放的最后一个事务的时间戳。这是在主数据库上生成该事务的提交或中止 WAL 记录的时间。
So this query select now()-pg_last_xact_replay_timestamp() as replication_lag
on a replica will return a duration representing the difference in time between the current clock and the timestamp of the last WAL record applied from the replication stream.
所以这个select now()-pg_last_xact_replay_timestamp() as replication_lag
对副本的查询将返回一个持续时间,表示当前时钟和从复制流应用的最后一个 WAL 记录的时间戳之间的时间差。
Note that if the master is not receiving new mutations, there will be no WAL records to stream and the lag calculated this way will grow without actually being a signal of delays in replication. If the master is under more or less continuous mutation, it will be continuously streaming WALs and the above query is a fine approximation of the time delay for changes on the master to materialize on the slave. Accuracy will obviously be affected by how rigorously synchronized the system clocks on the two hosts are.
请注意,如果 master 没有接收新的突变,则不会有 WAL 记录要流式传输,并且以这种方式计算的延迟会增长,而实际上不会是复制延迟的信号。如果 master 或多或少处于连续突变状态,它将连续流式传输 WAL,上面的查询是对 master 上的更改在 slave 上实现的时间延迟的精确近似。准确度显然会受到两台主机上系统时钟同步程度的影响。
回答by Ramanan
If your database has frequent writes, then the below query is a close approximation to get the slave lag
如果您的数据库有频繁写入,那么下面的查询是获得从机滞后的近似值
select now() - pg_last_xact_replay_timestamp() AS replication_delay;
Below is a more accurate query for calculating replication lag for databases with very few writes. If the master doesnt sent down any write to the slave, then pg_last_xact_replay_timestamp() can be constant and hence may not accurately determine the slave lag using the above query.
下面是一个更准确的查询,用于计算写入很少的数据库的复制延迟。如果主站没有向从站发送任何写入,则 pg_last_xact_replay_timestamp() 可以是常数,因此可能无法使用上述查询准确确定从站延迟。
SELECT CASE WHEN pg_last_xlog_receive_location() =
pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() -
pg_last_xact_replay_timestamp()) END AS log_delay;
回答by shurikk
slightly different version of the correct answer:
正确答案的版本略有不同:
postgres=# SELECT
pg_last_xlog_receive_location() receive,
pg_last_xlog_replay_location() replay,
(
extract(epoch FROM now()) -
extract(epoch FROM pg_last_xact_replay_timestamp())
)::int lag;
receive | replay | lag
------------+------------+-------
1/AB861728 | 1/AB861728 | 2027
the lag is only important when "receive" not equal "replay". execute the query on the replica
只有当“接收”不等于“重播”时,滞后才重要。在副本上执行查询
回答by Vao Tsun
as of 10 release:
截至 10 版:
https://www.postgresql.org/docs/10/static/monitoring-stats.html#pg-stat-replication-view
https://www.postgresql.org/docs/10/static/monitoring-stats.html#pg-stat-replication-view
write_lagintervalTime elapsed between flushing recent WAL locally and receiving notification that this standby server has written it (but not yet flushed it or applied it). This can be used to gauge the delay that synchronous_commit level remote_write incurred while committing if this server was configured as a synchronous standby.
flush_lagintervalTime elapsed between flushing recent WAL locally and receiving notification that this standby server has written and flushed it (but not yet applied it). This can be used to gauge the delay that synchronous_commit level remote_flush incurred while committing if this server was configured as a synchronous standby.
replay_lagintervalTime elapsed between flushing recent WAL locally and receiving notification that this standby server has written, flushed and applied it. This can be used to gauge the delay that synchronous_commit level remote_apply incurred while committing if this server was configured as a synchronous standby.
write_lag间隔在本地刷新最近的 WAL 和接收到此备用服务器已写入它的通知(但尚未刷新或应用它)之间经过的时间。如果此服务器被配置为同步备用服务器,这可用于衡量 synchronous_commit 级别 remote_write 在提交时产生的延迟。
flush_lag间隔在本地刷新最近的 WAL 和接收到此备用服务器已写入并刷新它(但尚未应用它)的通知之间经过的时间。如果此服务器被配置为同步备用服务器,这可用于衡量 synchronous_commit 级别 remote_flush 在提交时产生的延迟。
replay_lag间隔在本地刷新最近的 WAL 和接收到此备用服务器已写入、刷新和应用它的通知之间经过的时间。如果此服务器配置为同步备用服务器,这可用于衡量 synchronous_commit 级别 remote_apply 在提交时产生的延迟。
(formatting mine)
(格式化我的)
Alas new columns seem to suit only synchronous replication (otherwise master would not know exact lag) thus async replication delay chack seem to remain now()-pg_last_xact_replay_timestamp()
...
唉,新列似乎只适合同步复制(否则主不会知道确切的延迟)因此异步复制延迟 chack 似乎仍然存在now()-pg_last_xact_replay_timestamp()
......
回答by linehrr
on master, you can do select * from pg_stat_replication;
this will give you:
在主人身上,你可以这样做select * from pg_stat_replication;
会给你:
| sent_lsn | write_lsn | flush_lsn | replay_lsn
-+-------------+-------------+-------------+-------------
| 8D/2DA48000 | 8D/2DA48000 | 8D/2DA48000 | 89/56A0D500
those can tell you where your offsets are. as you can see from this example, replay on the replica is behind.
这些可以告诉你你的偏移量在哪里。从这个例子中可以看出,副本上的重放落后了。