postgresql 在 Postgres 中创建索引的最有效方法

Question

提问by mvp

Is it more efficient to create an index after loading data is complete or before, or does it not matter?

在加载数据完成之后还是之前创建索引更有效，还是无关紧要？

For example, say I have 500 files to load into a Postgres 8.4 DB. Here are the two index creation scenarios I could use:

例如，假设我有 500 个文件要加载到 Postgres 8.4 DB 中。以下是我可以使用的两个索引创建场景：

Create index when table is created, then load each file into table; or
Create index after all files have been loaded into the table.

创建表时创建索引，然后将每个文件加载到表中；或者
在所有文件加载到表中后创建索引。

The table data itself is about 45 Gigabytes. The index is about 12 Gigabytes. I'm using a standard index. It is created like this:

表数据本身大约为 45 GB。该索引约为 12 GB。我正在使用标准索引。它是这样创建的：

CREATE INDEX idx_name ON table_name (column_name);

My data loading uses COPY FROM.

我的数据加载使用COPY FROM。

Once all the files are loaded, no updates, deletes or additional loads will occur on the table (it's a day's worth of data that will not change). So I wanted to ask which scenario would be most efficient? Initial testing seems to indicate that loading all the files and then creating the index (scenario 2) is faster, but I have not done a scientific comparison of the two approaches.

一旦加载了所有文件，表上就不会发生更新、删除或额外加载（这是一天的数据不会改变）。所以我想问一下哪种情况最有效？初步测试似乎表明加载所有文件然后创建索引（场景 2）更快，但我没有对这两种方法进行科学比较。

Answer 1

回答by mvp

Your observation is correct - it is much more efficient to load data first and only then create index. Reason for this is that index updates during insert are expensive. If you create index after all data is there, it is much faster.

您的观察是正确的 - 首先加载数据然后再创建索引效率更高。这样做的原因是插入期间的索引更新很昂贵。如果在所有数据都存在后创建索引，速度会快得多。

It goes even further - if you need to import large amount of data into existing indexed table, it is often more efficient to drop existing index first, import the data, and then re-create index again.

更进一步 - 如果您需要将大量数据导入现有索引表，通常先删除现有索引，导入数据，然后再次重新创建索引，效率更高。

One downside of creating index after importing is that table must be locked, and that may take long time (it will not be locked in opposite scenario). But, in PostgreSQL 8.2 and later, you can use CREATE INDEX CONCURRENTLY, which does not lock table during indexing (with some caveats).

导入后创建索引的一个缺点是表必须被锁定，这可能需要很长时间（在相反的情况下不会被锁定）。但是，在 PostgreSQL 8.2 及更高版本中，您可以使用CREATE INDEX CONCURRENTLY，它不会在索引期间锁定表（有一些警告）。

postgresql 在 Postgres 中创建索引的最有效方法

提问by mvp

回答by mvp

相关推荐

最近更新

标签

postgresql 在 Postgres 中创建索引的最有效方法

提问by mvp

回答by mvp

相关推荐

postgresql 如何使用 pg_stat_activity？

如何从 CSV 文件复制到带有 CSV 文件标题的 PostgreSQL 表？

postgresql Postgres -- 从字符串中删除一组字母

postgresql Postgres DB 未在 Mac OSX 上启动：错误表示：Unix 域套接字上的连接

相关推荐

最近更新

标签