复制 postgresql 数据库的更快方法(或最好的方法)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15692508/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
A faster way to copy a postgresql database (or the best way)
提问by David Bain
I did a pg_dump of a database and am now trying to install the resulting .sql file on to another server.
我做了一个数据库的 pg_dump,现在我正在尝试将生成的 .sql 文件安装到另一台服务器上。
I'm using the following command.
我正在使用以下命令。
psql -f databasedump.sql
I initiated the database install earlier today and now 7 hours later the database is still being populated. I don't know if this his how long it is supposed to take, but I continue to monitor it, so far I've seen over 12 millon inserts and counting. I suspect there's a faster way to do this.
我今天早些时候启动了数据库安装,现在 7 小时后,数据库仍在填充中。我不知道这是否是他应该花多长时间,但我继续监视它,到目前为止我已经看到超过 1200 万个插入和计数。我怀疑有一种更快的方法可以做到这一点。
回答by mullerivan
Create your dumps with
创建您的转储
pg_dump -Fc -Z 9 --file=file.dump myDb
Fc
Fc
Output a custom archive suitable for input into pg_restore. This is the most flexible format in that it allows reordering of loading data as well as object definitions. This format is also compressed by default.
输出适合输入到 pg_restore 的自定义存档。这是最灵活的格式,因为它允许重新排序加载数据以及对象定义。默认情况下,此格式也被压缩。
Z 9: --compress=0..9
Z 9: --compress=0..9
Specify the compression level to use. Zero means no compression. For the custom archive format, this specifies compression of individual table-data segments, and the default is to compress at a moderate level. For plain text output, setting a nonzero compression level causes the entire output file to be compressed, as though it had been fed through gzip; but the default is not to compress. The tar archive format currently does not support compression at all.
指定要使用的压缩级别。零意味着没有压缩。对于自定义存档格式,这指定了单个表数据段的压缩,默认是在中等级别压缩。对于纯文本输出,设置非零压缩级别会导致整个输出文件被压缩,就好像它是通过 gzip 输入的一样;但默认是不压缩。tar 存档格式目前根本不支持压缩。
and restore it with
并用
pg_restore -Fc -j 8 file.dump
-j: --jobs=number-of-jobs
-j: --jobs=number-of-jobs
Run the most time-consuming parts of pg_restore — those which load data, create indexes, or create constraints — using multiple concurrent jobs. This option can dramatically reduce the time to restore a large database to a server running on a multiprocessor machine.
Each job is one process or one thread, depending on the operating system, and uses a separate connection to the server.
The optimal value for this option depends on the hardware setup of the server, of the client, and of the network. Factors include the number of CPU cores and the disk setup. A good place to start is the number of CPU cores on the server, but values larger than that can also lead to faster restore times in many cases. Of course, values that are too high will lead to decreased performance because of thrashing.
Only the custom and directory archive formats are supported with this option. The input must be a regular file or directory (not, for example, a pipe). This option is ignored when emitting a script rather than connecting directly to a database server. Also, multiple jobs cannot be used together with the option --single-transaction.
使用多个并发作业运行 pg_restore 中最耗时的部分——加载数据、创建索引或创建约束的部分。此选项可以显着减少将大型数据库恢复到运行在多处理器机器上的服务器的时间。
每个作业是一个进程或一个线程,具体取决于操作系统,并使用与服务器的单独连接。
此选项的最佳值取决于服务器、客户端和网络的硬件设置。因素包括 CPU 内核数和磁盘设置。一个好的起点是服务器上的 CPU 内核数,但在许多情况下,大于该值的值也可以导致更快的恢复时间。当然,过高的值会因为颠簸而导致性能下降。
此选项仅支持自定义和目录归档格式。输入必须是常规文件或目录(例如,不是管道)。发出脚本而不是直接连接到数据库服务器时,将忽略此选项。此外,多个作业不能与选项 --single-transaction 一起使用。
Links:
链接:
回答by Yanar Assaf
Improve pg dump&restore
改进 pg dump&restore
PG_DUMP | always use format directory with -j
option
PG_DUMP | 始终使用带-j
选项的格式目录
time pg_dump -j 8 -Fd -f /tmp/newout.dir fsdcm_external
PG_RESTORE | always use tuning for postgres.conf with format directory With -j
option
PG_RESTORE | 始终对带有格式目录的 postgres.conf 使用调优 带-j
选项
work_mem = 32MB
shared_buffers = 4GB
maintenance_work_mem = 2GB
full_page_writes = off
autovacuum = off
wal_buffers = -1
time pg_restore -j 8 --format=d -C -d postgres /tmp/newout.dir/`
For more info
欲了解更多信息
https://gitlab.com/yanar/Tuning/wikis/improve-pg-dump&restore
https://gitlab.com/yanar/Tuning/wikis/improve-pg-dump&restore
回答by Richard Huxton
Why are you producing a raw .sql dump? The opening description of pg_dumprecommends the "custom" format -Fc
.
为什么要生成原始的 .sql 转储?pg_dump的开头说明推荐“自定义”格式-Fc
。
Then you can use pg_restore which will restore your data (or selected parts of it). There is a "number of jobs" option -j
which can use multiple cores (assuming your disks aren't already the limiting factor). In most cases, on a modern machine you can expect at least some gains from this.
然后你可以使用 pg_restore 来恢复你的数据(或它的选定部分)。有一个“作业数量”选项-j
可以使用多个内核(假设您的磁盘不是限制因素)。在大多数情况下,在现代机器上,您至少可以从中获得一些收益。
Now you say "I don't know how long this is supposed to take". Well, until you've done a few restores you won't know. Do monitor what your system is doing and whether you are limited by cpu or disk I/O.
现在你说“我不知道这需要多长时间”。好吧,除非您进行了几次恢复,否则您不会知道。务必监控您的系统正在做什么以及您是否受到 CPU 或磁盘 I/O 的限制。
Finally, the configuration settings you want for restoring a database are not those you want to run it. A couple of useful starters:
最后,还原数据库所需的配置设置不是运行数据库所需的配置设置。几个有用的开场白:
- Increase maintenance_work_memso you can build indexes in larger chunks
- Turn off fsyncduring the restore. If your machine crashes, you'll start from scratch again anyway.
- 增加maintenance_work_mem以便您可以在更大的块中构建索引
- 在恢复过程中关闭fsync。如果您的机器崩溃,无论如何您都会从头开始。
Do remember to reset them after the restore though.
不过记得在恢复后重置它们。
回答by hoxworth
The usage of pg_dump
is generally recommended to be paired with pg_restore
, instead of psql
. This method can be split among cores to speed up the loading process by passing the --jobs
flag as such:
的用法pg_dump
一般建议搭配pg_restore
, 而不是psql
。此方法可以通过传递--jobs
标志在核心之间拆分以加快加载过程:
$ pg_restore --jobs=8 dump.sql
Postgres themselves have a guideon bulk loading of data.
Postgres 本身有一个关于批量加载数据的指南。
I also would recommend heavily tuning your postgresql.conf
configuration file and set appropriately high values for the maintenance_work_mem
and checkpoint_segments
values; higher values on these may dramatically increase your write performance.
我还建议大量调整您的postgresql.conf
配置文件,并为maintenance_work_mem
和checkpoint_segments
值设置适当的高值;这些更高的值可能会显着提高您的写入性能。