我可以安全地从 WAL 存档文件夹中删除哪些 Postgresql WAL 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35144403/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 06:09:51  来源:igfitidea点击:

Which Postgresql WAL files can I safely remove from the WAL archive folder

postgresqlarchivingwal

提问by undercurrent

Current situation

现在的情况

So I have WAL archiving set up to an independent internal harddrive on a data logging computer running Postgres. The harddrive containing the WAL archives is filling up and I'd like to remove and archive all the WAL archive files, including the initial base backup, to external backup drives.

因此,我在运行 Postgres 的数据记录计算机上将 WAL 归档设置为独立的内部硬盘驱动器。包含 WAL 存档的硬盘驱动器已满,我想将所有 WAL 存档文件(包括初始基本备份)删除并存档到外部备份驱动器。

The directory structure is like:

目录结构如下:

D:/WALBACKUP/ which is the parent folder for all the WAL files (00000110000.CA00000004 etc)

D:/WALBACKUP/ 这是所有 WAL 文件的父文件夹(00000110000.CA00000004 等)

D:/WALBACKUP/BASEBACKUP/ which holds the .tar of the initial base backup

D:/WALBACKUP/BASEBACKUP/ 保存初始基础备份的 .tar

The question I have then is:

我当时的问题是:

  • Can I safely move literally every single WAL file except the current WAL archive file, (000000000001.CA0000.. and so on), including the base backup, and move them to another hdd. (Note that the database is live and receiving data)
  • 我可以安全地移动除当前 WAL 存档文件(000000000001.CA0000.. 等)之外的每个 WAL 文件,包括基本备份,并将它们移动到另一个硬盘上。(注意数据库是实时的,正在接收数据)

cheers!

干杯!

回答by Craig Ringer

WAL archives

WAL 档案

You can use the pg_archivecleanupcommand to remove WAL from an archive (notpg_xlog) that's not required by a given base backup.

您可以使用该pg_archivecleanup命令从给定基本备份不需要的存档(notpg_xlog)中删除 WAL 。

In general I suggest using PgBarman or a similar tool to automate your base backups and WAL retention though. It's easier and less error prone.

一般来说,我建议使用 PgBarman 或类似的工具来自动化您的基本备份和 WAL 保留。它更容易,更不容易出错。

pg_xlog

pg_xlog

Never remove WAL from pg_xlogmanually. If you have too much WAL then:

永远不要pg_xlog手动删除 WAL 。如果你有太多的 WAL 那么:

  • your wal_keep_segmentssetting is keeping WAL around;
  • you have archive_modeon and archive_commandset but it isn't working correctly (check the logs);
  • your checkpoint_segmentsis ridiculously high so you're just generating too much WAL; or
  • you have a replication slot (see the pg_replication_slotsview) that's preventing the removal of WAL.
  • 您的wal_keep_segments设置使 WAL 保持不变;
  • 您已archive_mode开启并archive_command设置,但无法正常工作(检查日志);
  • checkpoint_segments是高得离谱,所以你只是产生了太多的 WAL;或者
  • 您有一个pg_replication_slots阻止删除 WAL的复制槽(参见视图)。

You should fix the problem that's causing WAL to be retained. If nothing seems to have happened after changing a setting run a manual CHECKPOINTcommand.

您应该修复导致 WAL 被保留的问题。如果更改设置后似乎没有发生任何事情,请运行手动CHECKPOINT命令。

If you have an offline server and need to remove WAL to start it you can use pg_archivecleanupif you must. It knows how to remove only WAL that isn't needed by the server its self ... but it might break your archive-based backups, streaming replicas, etc. So don't use it unless you must.

如果你有一个离线服务器并且需要删除 WAL 来启动它,你可以pg_archivecleanup在必要时使用。它知道如何仅删除服务器本身不需要的 WAL……但它可能会破坏基于存档的备份、流式复制等。所以除非你必须,否则不要使用它。

回答by Patrick

WAL files are incremental, so the simple answer is: You cannot throw any files out. The solution is to make a new base backup and then all previous WALs can be deleted.

WAL 文件是增量文件,所以简单的答案是:您不能丢弃任何文件。解决方案是制作一个新的基本备份,然后可以删除所有以前的 WAL。

The WAL files contain individual statements that modify tables so if you throw some older WALs out, then the recovery process will fail (it will not silently skip missing WAL files) because the state of the database cannot be restored reliably. You can move the WAL files to some other location without upsetting the WAL process but then you'd have to make all WAL files available again from a single location if you ever need to recover your database from some point in the past; if you are running out of disk space then that may mean recovering from some location where you have enough space to store the base backup and all WAL files. The main issue here is if you can do that fast enough to restore a full database after an incident.

WAL 文件包含修改表的单个语句,因此如果您丢弃一些旧的 WAL,则恢复过程将失败(它不会静默跳过丢失的 WAL 文件),因为无法可靠地恢复数据库的状态。您可以将 WAL 文件移动到其他位置,而不会影响 WAL 进程,但是如果您需要从过去的某个时间点恢复数据库,则必须从一个位置再次使所有 WAL 文件可用;如果您的磁盘空间不足,那么这可能意味着从有足够空间存储基本备份和所有 WAL 文件的某个位置进行恢复。这里的主要问题是,您是否可以足够快地在事件发生后恢复完整的数据库。

Another issue is that if you cannot identify where/when a problem occurred that needs to be corrected your only option is to start with the base backup and then replay all the WAL files. This procedure is not difficult, but if you have an old base backup and many WAL files to process, this simply takes a lot of time.

另一个问题是,如果您无法确定发生问题的位置/时间需要更正,您唯一的选择是从基本备份开始,然后重放所有 WAL 文件。这个过程并不难,但如果你有一个旧的基础备份和许多 WAL 文件要处理,这只是需要很多时间。

The best approach for your case, in general, is to make a new base backup every x months and collect WALs with that base backup. After every new base backup you can delete the old base backup and its subsequent WALs or move them to cheap offline storage (DVD, tape, etc). In the case of a major incident you can quickly restore the database to a known correct state from the recent base backup and the relatively few WAL files collected since then.

通常,对于您的情况,最好的方法是每 x 个月进行一次新的基础备份,并使用该基础备份收集 WAL。在每次新的基础备份之后,您可以删除旧的基础备份及其后续 WAL 或将它们移动到便宜的离线存储(DVD、磁带等)。在发生重大事件的情况下,您可以从最近的基本备份和此后收集的相对较少的 WAL 文件中快速将数据库恢复到已知的正确状态。

回答by Viorel

A solution that we went for, is executing pg_basebackupevery night. This would create a base backup and later on we can use pg_archivecleanupto clean up all the "old" WAL files before that base using something like

我们采用的一个解决方案是每晚执行pg_basebackup。这将创建一个基本备份,稍后我们可以使用pg_archivecleanup使用类似

"%POSTGRES_INSTALLDIR%\bin\pg_archivecleanup" -d %WAL_backup_dir% %newestBaseFile%

Fortunately, we never had to recover yet, but it should work in theory.

幸运的是,我们还没有恢复,但理论上应该是可行的。

回答by Aldo

In case someone found this by searching how to safely cleanup the WAL directory under a replication architecture, consider the scenario where there might be left overs from offlinereplicas, in this case, unused replica slots waiting for the replica to come back online and thus keeping a lot of WAL archives on the Master DB.

如果有人通过搜索如何安全地清理复制架构下的 WAL 目录发现了这一点,请考虑可能存在offline副本遗留的场景,在这种情况下,未使用的副本插槽等待副本重新上线,从而保持一个主数据库上有很多 WAL 档案。

In our case we had an issue with a replica going down due to hardware failure, we had to recreate it along with its replica_sloton the Master DB but forgot to get rid of the previous used one. Once we cleared that out PSQL got rid of unused WALs and all was good.

在我们的例子中,我们遇到了由于硬件故障而导致副本宕机的问题,我们不得不replica_slot在主数据库上重新创建它,但忘记删除以前使用过的副本。一旦我们清除了这一点,PSQL 就摆脱了未使用的 WAL,一切都很好。