postgresql Postgres:检查点发生过于频繁

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27972393/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:45:13  来源:igfitidea点击:

Postgres: Checkpoints Are Occurring Too Frequently

postgresql

提问by user1517922

We have a powerful Postgres server (64 cores, 384 GB RAM, 16 15k SAS drives, RAID 10), and several times during the day we rebuild several large datasets, which is very write intensive. Apache and Tomcat also run on the same server.

我们有一个强大的 Postgres 服务器(64 核、384 GB RAM、16 个 15k SAS 驱动器、RAID 10),并且我们白天多次重建几个大型数据集,这是非常写密集的。Apache 和 Tomcat 也运行在同一台服务器上。

We're getting this warning about 300 times a day, while rebuilding these datasets, with long stretches where the errors are averaging 2 - 5 seconds apart:

在重建这些数据集时,我们每天收到大约 300 次此警告,其中错误平均间隔 2 到 5 秒:

2015-01-15 12:32:53 EST [11403]: [10841-1] LOG:  checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:32:56 EST [11403]: [10845-1] LOG:  checkpoints are occurring too frequently (3 seconds apart)
2015-01-15 12:32:58 EST [11403]: [10849-1] LOG:  checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:33:01 EST [11403]: [10853-1] LOG:  checkpoints are occurring too frequently (3 seconds apart)

These are the related settings:

这些是相关设置:

checkpoint_completion_target    0.7
checkpoint_segments 64
checkpoint_timeout  5min
checkpoint_warning  30s
wal_block_size  8192
wal_buffers     4MB
wal_keep_segments   5000
wal_level   hot_standby
wal_receiver_status_interval    10s
wal_segment_size    16MB
wal_sync_method     fdatasync
wal_writer_delay    200ms
work_mem    96MB
shared_buffers  24GB
effective_cache_size    128GB

So that means we're writing 1024 MB worth of WAL files every 2 - 5 seconds, sometimes sustained for 15 - 30 minutes.

这意味着我们每 2 - 5 秒写入 1024 MB 的 WAL 文件,有时会持续 15 - 30 分钟。

1) Do you see any settings we can improve on? Let me know if you need other settings documented.

1) 您是否看到我们可以改进的任何设置?如果您需要记录其他设置,请告诉我。

2) Could we use "SET LOCAL synchronous_commit TO OFF;" at the beginning of these write-intensive transactions to let these WAL writes happen a bit more in the background, having less impact on the rest of the operations?

2)我们可以使用“SET LOCAL synchronous_commit TO OFF”吗?在这些写入密集型事务开始时,让这些 WAL 写入在后台发生更多一点,对其余操作的影响较小?

The data we're rebuilding is stored elsewhere, so on the off chance the power failed AND the RAID battery backup didn't do it's job, we're not out anything once the dataset gets rebuilt again.

我们正在重建的数据存储在其他地方,所以如果电源出现故障并且 RAID 电池备份没有完成它的工作,一旦数据集再次重建,我们就不会出任何东西。

Would "SET LOCAL synchronous_commit TO OFF;" cause any problems if this continues for 15 - 30 minutes? Or cause any problems with our streaming replication, which uses WAL senders?

将“SET LOCAL synchronous_commit TO OFF;” 如果这种情况持续 15 - 30 分钟,会导致任何问题吗?或者导致我们使用 WAL 发送器的流式复制出现任何问题?

Thanks!

谢谢!

PS. I'm hoping Samsung starts shipping their SM1715 3.2 TB PCIe enterprise SSD, since I think it would solve our problems nicely.

附注。我希望三星开始出货他们的 SM1715 3.2 TB PCIe 企业级固态硬盘,因为我认为它可以很好地解决我们的问题。

回答by Ben Grimm

Your server is generating so much WAL data due to the wal_levelset to hot_standby. I'm assuming you need this, so the best option to avoid the warnings is to increase your checkpoint_segments. But they are just that - warnings- it's quite common and perfectly normal to see them during bulk updates and data loads. You just happen to be updating frequently.

由于wal_level设置为 ,您的服务器正在生成如此多的 WAL 数据hot_standby。我假设您需要这个,因此避免警告的最佳选择是增加您的checkpoint_segments. 但它们只是 -警告- 在批量更新和数据加载期间看到它们是很常见且完全正常的。你只是碰巧经常更新。

Changing synchronous_commitdoes not change what is written to the WAL, but rather the timing of when the commit returns to allow the OS to buffer those writes.

更改synchronous_commit不会更改写入 WAL 的内容,而是更改提交返回以允许操作系统缓冲这些写入的时间。

It may not apply to your schema, but you could potentially save some WAL data by using unlogged tables for your data rebuilds. Your replicas wouldn't have access to those tables, but after the rebuild you would be able to update your logged tables from their unlogged siblings.

它可能不适用于您的架构,但您可以通过使用未记录的表进行数据重建来潜在地保存一些 WAL 数据。您的副本将无法访问这些表,但在重建之后,您将能够从未记录的同级更新已记录的表。