PostgreSQL:改进 pg_dump、pg_restore 性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2094963/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 22:24:45  来源:igfitidea点击:

PostgreSQL: improving pg_dump, pg_restore performance

performancepostgresqlbackuprestore

提问by Joe Creighton

When I began, I used pg_dumpwith the default plain format. I was unenlightened.

当我开始时,我使用pg_dump默认的纯格式。我不知所措。

Research revealed to me time and file size improvements with pg_dump -Fc | gzip -9 -c > dumpfile.gz. I was enlightened.

研究向我揭示了使用pg_dump -Fc | gzip -9 -c > dumpfile.gz. 我开悟了。

When it came time to create the database anew,

当需要重新创建数据库时,

# create tablespace dbname location '/SAN/dbname';
# create database dbname tablespace dbname;
# alter database dbname set temp_tablespaces = dbname;

% gunzip dumpfile.gz              # to evaluate restore time without a piped uncompression
% pg_restore -d dbname dumpfile   # into a new, empty database defined above

I felt unenlightened: the restore took 12 hours to create the database that's only a fraction of what it will become:

我感到很茫然:恢复花了 12 个小时来创建数据库,这只是它将成为的一小部分:

# select pg_size_pretty(pg_database_size('dbname'));
47 GB

Because there are predictions this database will be a few terabytes, I need to look at improving performance now.

因为有人预测这个数据库会有几 TB,所以我现在需要考虑提高性能。

Please, enlighten me.

请赐教。

采纳答案by Ants Aasma

First check that you are getting reasonable IO performance from your disk setup. Then check that you PostgreSQL installation is appropriately tuned. In particular shared_buffersshould be set correctly, maintenance_work_memshould be increased during the restore, full_page_writesshould be off during the restore, wal_buffersshould be increased to 16MB during the restore, checkpoint_segmentsshould be increased to something like 16 during the restore, you shouldn't have any unreasonable logging on (like logging every statement executed), auto_vacuumshould be disabled during the restore.

首先检查您的磁盘设置是否获得了合理的 IO 性能。然后检查您的 PostgreSQL 安装是否已适当调整。特别是shared_buffers要设置正确,maintenance_work_mem还原时full_page_writes应该增加,还原时应该关闭,还原时wal_buffers应该增加到16MB,还原时checkpoint_segments应该增加到16之类的,你不应该有任何不合理的登录(如记录每个执行的语句),auto_vacuum应在恢复期间禁用。

If you are on 8.4 also experiment with parallel restore, the --jobs option for pg_restore.

如果您使用的是 8.4,还可以尝试并行还原,pg_restore 的 --jobs 选项。

回答by Yanar Assaf

Improve pg dump&restore

改进 pg dump&restore

PG_DUMP | always use format directory with -joption

PG_DUMP | 始终使用带-j选项的格式目录

time pg_dump -j 8 -Fd -f /tmp/newout.dir fsdcm_external

PG_RESTORE | always use tuning for postgres.conf with format directory With -joption

PG_RESTORE | 始终对带有格式目录的 postgres.conf 使用调优 带-j选项

work_mem = 32MB
shared_buffers = 4GB
maintenance_work_mem = 2GB
full_page_writes = off
autovacuum = off
wal_buffers = -1

time pg_restore -j 8 --format=d -C -d postgres /tmp/newout.dir/

回答by Matthew Wood

Two issues/ideas:

两个问题/想法:

  1. By specifying -Fc, the pg_dump output is already compressed. The compression is not maximal, so you may find some space savings by using "gzip -9", but I would wager it's not enough to warrant the extra time (and I/O) used compressing and uncompressing the -Fc version of the backup.

  2. If you are using PostgreSQL 8.4.x you can potentially speed up the restore from a -Fc backup with the new pg_restore command-line option "-j n" where n=number of parallel connections to use for the restore. This will allow pg_restore to load more than one table's data or generate more than one index at the same time.

  1. 通过指定 -Fc,pg_dump 输出已经被压缩。压缩不是最大的,因此您可能会发现使用“gzip -9”可以节省一些空间,但我敢打赌这不足以保证用于压缩和解压缩备份的 -Fc 版本的额外时间(和 I/O) .

  2. 如果您使用的是 PostgreSQL 8.4.x,您可以使用新的 pg_restore 命令行选项“-j n”潜在地加快从 -Fc 备份的还原速度,其中 n=用于还原的并行连接数。这将允许 pg_restore 加载多个表的数据或同时生成多个索引。

回答by Tometzky

I assume you need backup, not a major upgrade of database.

我假设您需要备份,而不是数据库的重大升级。

For backup of large databases you should setup continuous archivinginstead of pg_dump.

对于大型数据库的备份,您应该设置连续归档而不是pg_dump.

  1. Set up WAL archiving.

  2. Make your base backups for example every day by using
    psql template1 -c "select pg_start_backup('`date +%F-%T``')"rsync -a --delete /var/lib/pgsql/data/ /var/backups/pgsql/base/psql template1 -c "select pg_stop_backup()"`

  1. 设置 WAL 归档

  2. 例如,每天使用
    psql template1 -c "select pg_start_backup('`date +%F-%T``') 进行基本备份")" rsync -a --delete /var/lib/pgsql/data/ /var/backups/pgsql/base/ psql template1 - c "选择 pg_stop_backup()"`

A restore would be as simple as restoring database and WAL logs not older than pg_start_backuptime from backup location and starting Postgres. And it will be much faster.

恢复就像pg_start_backup从备份位置恢复不早于时间的数据库和 WAL 日志并启动 Postgres 一样简单。而且会快很多。

回答by richo

zcat dumpfile.gz | pg_restore -d db_name

Removes the full write of the uncompressed data to disk, which is currently your bottleneck.

删除未压缩数据到磁盘的完整写入,这是当前的瓶颈。

回答by Will Hartung

As you may have guessed simply by the fact that compressing the backup results in faster performance, your backup is I/O bound. This should come as no surprise as backup is pretty much always going to be I/O bound. Compressing the data trades I/O load for CPU load, and since most CPUs are idle during monster data transfers, compression comes out as a net win.

您可能已经猜到了压缩备份会带来更快的性能这一事实,您的备份受 I/O 限制。这应该不足为奇,因为备份几乎总是受 I/O 限制。压缩数据会用 I/O 负载换取 CPU 负载,并且由于大多数 CPU 在大量数据传输期间处于空闲状态,因此压缩会带来净胜。

So, to speed up backup/restore times, you need faster I/O. Beyond reorganizing the database to not be one huge single instance, that's pretty much all you can do.

因此,要加快备份/恢复时间,您需要更快的 I/O。除了将数据库重组为一个庞大的单个实例之外,您几乎可以做的就是这些。