postgresql Postgres 转储仅部分表的开发快照

Question

提问by Jonathan Peterson

On production our database is a few hundred gigabytes in size. For development and testing, we need to create snapshots of this database that are functionally equivalent, but which are only 10 or 20 gigs in size.

在生产中，我们的数据库大小为几百 GB。对于开发和测试，我们需要创建功能相同但只有 10 或 20 gig 大小的此数据库的快照。

The challenge is that the data for our business entities are scattered across many tables. We want to create some sort of filtered snapshot so that only someof the entities are included in the dump. That way we can get fresh snapshots every month or so for dev and testing.

挑战在于我们业务实体的数据分散在许多表中。我们想要创建某种过滤快照，以便转储中只包含一些实体。这样我们就可以每个月左右为开发和测试获取新的快照。

For example, let's say we have entities that have these many-to-many relationships:

例如，假设我们有具有这些多对多关系的实体：

Company has N Divisions
Division has N Employees
Employee has N Attendance Records

公司有N个事业部
部门有 N 个员工
员工有 N 个出勤记录

There are maybe 1000 companies, 2500 divisions, 175000 employees, and tens of millions of attendance records. We want a replicable way to pull, say, the first 100 companies and all of its constituent divisions, employees, and attendance records.

可能有1000家公司，2500个部门，175000名员工，以及数千万的考勤记录。我们想要一种可复制的方式来提取前 100 家公司及其所有组成部门、员工和出勤记录。

We currently use pg_dump for the schema, and then run pg_dump with --disable-triggers and --data-only to get all the data out of the smaller tables. We don't want to have to write custom scripts to pull out part of the data because we have a fast development cycle and are concerned the custom scripts would be fragile and likely to be out of date.

我们目前使用 pg_dump 作为模式，然后使用 --disable-triggers 和 --data-only 运行 pg_dump 以从较小的表中获取所有数据。我们不想编写自定义脚本来提取部分数据，因为我们的开发周期很快，并且担心自定义脚本会很脆弱并且可能会过时。

How can we do this? Are there third-party tools that can help pull out logical partitions from the database? What are these tools called?

我们应该怎么做？是否有第三方工具可以帮助从数据库中提取逻辑分区？这些工具叫什么？

Any general advice also appreciated!

任何一般性建议也表示赞赏！

Answer 1

回答by Ben

On your larger tables you can use the COPY command to pull out subsets...

在较大的表上，您可以使用 COPY 命令来提取子集...

COPY (SELECT * FROM mytable WHERE ...) TO '/tmp/myfile.tsv'

COPY mytable FROM 'myfile.tsv'

https://www.postgresql.org/docs/current/static/sql-copy.html

You should consider maintaining a set of development data rather than just pulling a subset of your production. In the case that you're writing unit tests, you could use the same data that is required for the tests, trying to hit all of the possible use cases.

您应该考虑维护一组开发数据，而不仅仅是提取生产的一个子集。在编写单元测试的情况下，您可以使用测试所需的相同数据，尝试命中所有可能的用例。

Answer 2

回答by Aleksander Kmetec

I don't know about any software which already does this, but I can think of 3 alternative solutions. Unfortunately, they all require some custom coding.

我不知道任何已经这样做的软件，但我可以想到 3 种替代解决方案。不幸的是，它们都需要一些自定义编码。

Re-create all the tables in a separate schema, then copy into those tables only the subset of data you would like to dump, using INSERT INTO copy.tablename SELECT * FROM tablename WHERE ...and dump that.
Write your own script for dumping data as SQL statements. I have used this approach in the past and it only took something like 20-30 lines of PHP.
Modify pg_dump so it accepts a condition along with the -t switch when dumping a single table.

在单独的模式中重新创建所有表，然后仅将您想要转储的数据子集复制到这些表中，使用INSERT INTO copy.tablename SELECT * FROM tablename WHERE ...并转储该数据。
编写您自己的脚本以将数据转储为 SQL 语句。我过去使用过这种方法，只需要 20-30 行 PHP 代码。
修改 pg_dump 使其在转储单个表时接受条件和 -t 开关。

Answer 3

回答by Paul Legato

http://jailer.sourceforge.net/does this.

http://jailer.sourceforge.net/ 就是这样做的。

postgresql Postgres 转储仅部分表的开发快照

提问by Jonathan Peterson

回答by Ben

回答by Aleksander Kmetec

回答by Paul Legato

相关推荐

最近更新

标签

postgresql Postgres 转储仅部分表的开发快照

提问by Jonathan Peterson

回答by Ben

回答by Aleksander Kmetec

回答by Paul Legato

相关推荐

控制面板（数据源）中未显示 PostgreSQL ODBC 驱动程序

用于通过 USB 驱动器进行开发的便携式 PostgreSQL

如何在 PostgreSQL 中进行大型非阻塞更新？

如何将表导出为带有 Postgresql 标题的 CSV？

相关推荐

最近更新

标签