PostgreSQL：改进 pg_dump、pg_restore 性能

Question

提问by Joe Creighton

When I began, I used pg_dumpwith the default plain format. I was unenlightened.

当我开始时，我使用pg_dump默认的纯格式。我不知所措。

Research revealed to me time and file size improvements with pg_dump -Fc | gzip -9 -c > dumpfile.gz. I was enlightened.

研究向我揭示了使用pg_dump -Fc | gzip -9 -c > dumpfile.gz. 我开悟了。

When it came time to create the database anew,

当需要重新创建数据库时，

# create tablespace dbname location '/SAN/dbname';
# create database dbname tablespace dbname;
# alter database dbname set temp_tablespaces = dbname;

% gunzip dumpfile.gz              # to evaluate restore time without a piped uncompression
% pg_restore -d dbname dumpfile   # into a new, empty database defined above

I felt unenlightened: the restore took 12 hours to create the database that's only a fraction of what it will become:

我感到很茫然：恢复花了 12 个小时来创建数据库，这只是它将成为的一小部分：

# select pg_size_pretty(pg_database_size('dbname'));
47 GB

Because there are predictions this database will be a few terabytes, I need to look at improving performance now.

因为有人预测这个数据库会有几 TB，所以我现在需要考虑提高性能。

Please, enlighten me.

请赐教。

Answer 1

采纳答案by Ants Aasma

First check that you are getting reasonable IO performance from your disk setup. Then check that you PostgreSQL installation is appropriately tuned. In particular shared_buffersshould be set correctly, maintenance_work_memshould be increased during the restore, full_page_writesshould be off during the restore, wal_buffersshould be increased to 16MB during the restore, checkpoint_segmentsshould be increased to something like 16 during the restore, you shouldn't have any unreasonable logging on (like logging every statement executed), auto_vacuumshould be disabled during the restore.

首先检查您的磁盘设置是否获得了合理的 IO 性能。然后检查您的 PostgreSQL 安装是否已适当调整。特别是shared_buffers要设置正确，maintenance_work_mem还原时full_page_writes应该增加，还原时应该关闭，还原时wal_buffers应该增加到16MB，还原时checkpoint_segments应该增加到16之类的，你不应该有任何不合理的登录（如记录每个执行的语句），auto_vacuum应在恢复期间禁用。

If you are on 8.4 also experiment with parallel restore, the --jobs option for pg_restore.

如果您使用的是 8.4，还可以尝试并行还原，pg_restore 的 --jobs 选项。

Answer 2

回答by Yanar Assaf

Improve pg dump&restore

改进 pg dump&restore

PG_DUMP | always use format directory with -joption

PG_DUMP | 始终使用带-j选项的格式目录

time pg_dump -j 8 -Fd -f /tmp/newout.dir fsdcm_external

PG_RESTORE | always use tuning for postgres.conf with format directory With -joption

PG_RESTORE | 始终对带有格式目录的 postgres.conf 使用调优带-j选项

work_mem = 32MB
shared_buffers = 4GB
maintenance_work_mem = 2GB
full_page_writes = off
autovacuum = off
wal_buffers = -1

time pg_restore -j 8 --format=d -C -d postgres /tmp/newout.dir/

Answer 3

回答by Matthew Wood

Two issues/ideas:

两个问题/想法：

By specifying -Fc, the pg_dump output is already compressed. The compression is not maximal, so you may find some space savings by using "gzip -9", but I would wager it's not enough to warrant the extra time (and I/O) used compressing and uncompressing the -Fc version of the backup.
If you are using PostgreSQL 8.4.x you can potentially speed up the restore from a -Fc backup with the new pg_restore command-line option "-j n" where n=number of parallel connections to use for the restore. This will allow pg_restore to load more than one table's data or generate more than one index at the same time.

通过指定 -Fc，pg_dump 输出已经被压缩。压缩不是最大的，因此您可能会发现使用“gzip -9”可以节省一些空间，但我敢打赌这不足以保证用于压缩和解压缩备份的 -Fc 版本的额外时间（和 I/O） .
如果您使用的是 PostgreSQL 8.4.x，您可以使用新的 pg_restore 命令行选项“-j n”潜在地加快从 -Fc 备份的还原速度，其中 n=用于还原的并行连接数。这将允许 pg_restore 加载多个表的数据或同时生成多个索引。

Answer 4

回答by Tometzky

I assume you need backup, not a major upgrade of database.

我假设您需要备份，而不是数据库的重大升级。

For backup of large databases you should setup continuous archivinginstead of pg_dump.

对于大型数据库的备份，您应该设置连续归档而不是pg_dump.

Set up WAL archiving.
Make your base backups for example every day by using
psql template1 -c "select pg_start_backup('`date +%F-%T``')"rsync -a --delete /var/lib/pgsql/data/ /var/backups/pgsql/base/psql template1 -c "select pg_stop_backup()"`

设置 WAL 归档。
例如，每天使用
psql template1 -c "select pg_start_backup('`date +%F-%T``') 进行基本备份")" rsync -a --delete /var/lib/pgsql/data/ /var/backups/pgsql/base/ psql template1 - c "选择 pg_stop_backup()"`

A restore would be as simple as restoring database and WAL logs not older than pg_start_backuptime from backup location and starting Postgres. And it will be much faster.

恢复就像pg_start_backup从备份位置恢复不早于时间的数据库和 WAL 日志并启动 Postgres 一样简单。而且会快很多。

Answer 5

回答by richo

zcat dumpfile.gz | pg_restore -d db_name

Removes the full write of the uncompressed data to disk, which is currently your bottleneck.

删除未压缩数据到磁盘的完整写入，这是当前的瓶颈。

Answer 6

回答by Will Hartung

As you may have guessed simply by the fact that compressing the backup results in faster performance, your backup is I/O bound. This should come as no surprise as backup is pretty much always going to be I/O bound. Compressing the data trades I/O load for CPU load, and since most CPUs are idle during monster data transfers, compression comes out as a net win.

您可能已经猜到了压缩备份会带来更快的性能这一事实，您的备份受 I/O 限制。这应该不足为奇，因为备份几乎总是受 I/O 限制。压缩数据会用 I/O 负载换取 CPU 负载，并且由于大多数 CPU 在大量数据传输期间处于空闲状态，因此压缩会带来净胜。

So, to speed up backup/restore times, you need faster I/O. Beyond reorganizing the database to not be one huge single instance, that's pretty much all you can do.

因此，要加快备份/恢复时间，您需要更快的 I/O。除了将数据库重组为一个庞大的单个实例之外，您几乎可以做的就是这些。

PostgreSQL：改进 pg_dump、pg_restore 性能

提问by Joe Creighton

采纳答案by Ants Aasma

回答by Yanar Assaf

Improve pg dump&restore

改进 pg dump&restore

回答by Matthew Wood

回答by Tometzky

回答by richo

回答by Will Hartung

相关推荐

最近更新

标签

PostgreSQL：改进 pg_dump、pg_restore 性能

提问by Joe Creighton

采纳答案by Ants Aasma

回答by Yanar Assaf

Improve pg dump&restore

改进 pg dump&restore

回答by Matthew Wood

回答by Tometzky

回答by richo

回答by Will Hartung

相关推荐

postgresql 我从哪里获得 libpq 源代码？

什么是 PostgreSQL 等价于 SQL Server NVARCHAR？

postgresql 如何将 C# 与 Postgres 连接？

无法将 PostgreSQL 作为 Windows 服务运行

相关推荐

最近更新

标签