为什么 PostgresQL 查询性能会随着时间的推移而下降，但重建索引时会恢复

Question

提问by Jim Rush

According to this pagein the manual, indexes don't need to be maintained. However, we are running with a PostgresQL table that has a continuous rate of updates, deletesand insertsthat over time (a few days) sees a significant query degradation. If we delete and recreate the index, query performance is restored.

根据手册中的这个页面，indexes don't need to be maintained. 但是，我们正在运行一个 PostgresQL 表，它的连续速率为updates，deletes并且inserts随着时间的推移（几天），查询性能显着下降。如果我们删除并重新创建索引，查询性能就会恢复。

We are using out of the box settings.
The table in our test is currently starting out empty and grows to half a million rows. It has a fairly large row (lots of text fields).

我们正在使用开箱即用的设置。
我们测试中的表目前一开始是空的，然后增长到 50 万行。它有一个相当大的行（很多文本字段）。

We are searching based of an index, not the primary key(I've confirmed the index is being used, at least under normal conditions)

我们是searching based of an index, not the primary key（我已经确认正在使用索引，至少在正常情况下）

The table is being used as a persistent store for a single process. Using PostgresQL on Windows with a Java client.

该表被用作单个进程的持久存储。在带有 Java 客户端的 Windows 上使用 PostgresQL。

I'm willing to give up insert and update performanceto keep up the query performance.

我愿意放弃insert and update performance以保持查询性能。

We are considering rearchitecting the application so that data is spread across various dynamic tables in a manner that allows us to drop and rebuild indexes periodically without impacting the application. However, as always, there is a time crunch to get this to work and I suspect we are missing something basic in our configuration or usage.

我们正在考虑重新构建应用程序，以便数据以允许我们定期删除和重建索引而不影响应用程序的方式分布在各种动态表中。然而，与往常一样，要让它发挥作用需要时间，我怀疑我们在配置或使用中缺少一些基本的东西。

We have considered forcing vacuumingand rebuild to run at certain times, but I suspect the locking period for such an action would cause our query to block. This may be an option, but there are some real-time (windows of 3-5 seconds) implications that require other changes in our code.

我们已经考虑过forcing vacuuming和rebuild to run at certain times，但我怀疑locking period for such an action would cause our query to block. 这可能是一个选项，但有一些实时（3-5 秒的窗口）影响需要我们的代码进行其他更改。

Additional information:Table and index

附加信息：表和索引

CREATE TABLE icl_contacts
(
  id bigint NOT NULL,
  campaignfqname character varying(255) NOT NULL,
  currentstate character(16) NOT NULL,
  xmlscheduledtime character(23) NOT NULL,
...
25 or so other fields.  Most of them fixed or varying character fiel  
...
  CONSTRAINT icl_contacts_pkey PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
ALTER TABLE icl_contacts OWNER TO postgres;

CREATE INDEX icl_contacts_idx
  ON icl_contacts
  USING btree
  (xmlscheduledtime, currentstate, campaignfqname);

Analyze:

分析：

Limit  (cost=0.00..3792.10 rows=750 width=32) (actual time=48.922..59.601 rows=750 loops=1)
  ->  Index Scan using icl_contacts_idx on icl_contacts  (cost=0.00..934580.47 rows=184841 width=32) (actual time=48.909..55.961 rows=750 loops=1)
        Index Cond: ((xmlscheduledtime < '2010-05-20T13:00:00.000'::bpchar) AND (currentstate = 'SCHEDULED'::bpchar) AND ((campaignfqname)::text = '.main.ee45692a-6113-43cb-9257-7b6bf65f0c3e'::text))

And, yes, I am aware there there are a variety of things we could do to normalize and improve the design of this table. Some of these options may be available to us.

而且，是的，我知道有各种各样的事情we could do to normalize and improve the design of this table。我们可能会提供其中一些选项。

My focus in this question isabout understanding how PostgresQL is managing the index and query over time (understand why, not just fix). If it were to be done over or significantly refactored, there would be a lot of changes.

我在这个问题上的重点是理解how PostgresQL is managing the index and query over time (understand why, not just fix)。如果要重新完成或显着重构，将会有很多变化。

Answer 1

采纳答案by Timothy

Auto vacuum should do the trick, provided you configured it for your desired performance.

自动真空应该可以解决问题，前提是您将其配置为所需的性能。

Notes: VACUUM FULL: this will rebuild table statistics and reclaim loads of disk space. It locks the whole table.

注意： VACUUM FULL：这将重建表统计信息并回收磁盘空间负载。它锁定整个表。

VACUUM: this will rebuild table statistics and reclaim some disk space. It can be run in parallel with production system, but generates lots of IO which can impact performance.

VACUUM：这将重建表统计信息并回收一些磁盘空间。它可以与生产系统并行运行，但会产生大量影响性能的 IO。

ANALYZE: this will rebuild query planner statistics. This is triggered by VACUUM, but can be run on its own.

ANALYZE：这将重建查询计划器统计信息。这是由 VACUUM 触发的，但可以自行运行。

More detailed notes found here

在这里找到更详细的注释

Answer 2

回答by ákos Kiss

As for performance, using strings for storing time and status info is quite a bottleneck. First of all, indexes on texts are extremely inefficient, comparing two times on the same day needs at least 11 comparison (in the format you used), however, using time type it can be reduced to simply one comparison. This also effects the size of the index, and a large index is hard to search over, and the db won't keep it in memory. Same considerations apply to the state column. If it represents a small set of states, you should use integer numbers mapped to states, this will reduce the nodes of the index - and the index size accordingly. Furthermore, this index will be useless even using theese built-in types if you don't specify the actual time in your query.

至于性能，使用字符串来存储时间和状态信息是一个相当大的瓶颈。首先，对文本的索引效率极低，同一天比较两次至少需要11次比较（按照你使用的格式），但是，使用时间类型可以简化为一次比较。这也会影响索引的大小，大索引很难搜索，数据库不会将其保留在内存中。相同的注意事项适用于 state 列。如果它代表一小组状态，您应该使用映射到状态的整数，这将减少索引的节点 - 并相应地减少索引大小。此外，如果您没有在查询中指定实际时间，即使使用这些内置类型，该索引也将毫无用处。

Answer 3

回答by Chris Curvey

This smells like index bloat to me. I'l refer you to this page

这对我来说就像索引膨胀一样。我会推荐你到这个页面

http://www.postgresql.org/docs/8.3/static/routine-reindex.html

which says at the bottom:

在底部说：

Also, for B-tree indexes a freshly-constructed index is somewhat faster to access than one that has been updated many times, because logically adjacent pages are usually also physically adjacent in a newly built index. (This consideration does not currently apply to non-B-tree indexes.) It might be worthwhile to reindex periodically just to improve access speed.

此外，对于 B 树索引，新构建的索引比已更新多次的索引访问起来要快一些，因为在新构建的索引中，逻辑上相邻的页面通常也是物理上相邻的。（这种考虑目前不适用于非 B 树索引。）为了提高访问速度，定期重新索引可能是值得的。

Which does seem to conflict with the page you referenced saying that indexes "don't require maintenance or tuning".

这似乎与您引用的页面冲突，说索引“不需要维护或调整”。

Have you tried "create index concurrently"?

您是否尝试过“同时创建索引”？

Answer 4

回答by Stephen Denne

Is the '2010-05-20T13:00:00.000' value that xmlscheduledtime is being compared to, part of the SQL, or supplied as a parameter?

与 xmlscheduledtime 比较的 '2010-05-20T13:00:00.000' 值是 SQL 的一部分，还是作为参数提供？

When planning how to run the query, saying that a field must be less than a supplied parameter with an as yet unknown value doesn't give PostgreSQL much to go on. It doesn't know whether that'll match nearly all the rows, or hardly any of the rows.

在计划如何运行查询时，说一个字段必须小于提供的参数并且具有未知的值并不会让 PostgreSQL 有太多的事情要做。它不知道这是否会匹配几乎所有的行，或者几乎没有任何行。

Reading about how the planner uses statisticshelps tremendously when trying to figure out why your database is using the plans it is.

阅读有关规划器如何使用统计数据的信息，在尝试弄清楚您的数据库为何使用它的计划时非常有帮助。

You might get better select performance by changing the order of fields in that complex index, or creating a new index, with the fields ordered (campaignfqname, currentstate, xmlscheduledtime) since then the index will take you straight to the campaign fq name and current state that you are interested in, and the index scan over the xmlscheduledtime range will all be rows you're after.

您可能会通过更改该复杂索引中字段的顺序或创建新索引来获得更好的选择性能，其中字段已排序（campaignfqname、currentstate、xmlscheduledtime），此后索引将直接带您进入活动 fq 名称和当前状态您感兴趣的，并且在 xmlscheduledtime 范围内的索引扫描都将是您所追求的行。

Answer 5

回答by peufeu

That's a textbook case. You should setup autovacuum to be a lot more aggressive.

那是教科书般的案例。您应该将 autovacuum 设置得更加激进。

为什么 PostgresQL 查询性能会随着时间的推移而下降，但重建索引时会恢复

提问by Jim Rush

采纳答案by Timothy

回答by ákos Kiss

回答by Chris Curvey

回答by Stephen Denne

回答by peufeu

相关推荐

最近更新

标签

为什么 PostgresQL 查询性能会随着时间的推移而下降，但重建索引时会恢复

提问by Jim Rush

采纳答案by Timothy

回答by ákos Kiss

回答by Chris Curvey

回答by Stephen Denne

回答by peufeu

相关推荐

使用 PostgreSQL 数据库作为简单键值存储的最佳方式

在 MAC OS X 10.6 for PostgreSQL 上设置 SHMMAX 等值

Postgresql - 使用带有改变序列表达式的子查询

C# 和 PostgreSQL

相关推荐

最近更新

标签