postgresql 我如何知道 Postgres 表的统计数据是否是最新的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6903938/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 23:02:45  来源:igfitidea点击:

How do I know if the statistics of a Postgres table are up to date?

postgresqlstatisticsanalyzervacuum

提问by Beibei

In pgAdmin, whenever a table's statistics are out-of-date, it prompts:

在 pgAdmin 中,每当表的统计信息过期时,它会提示:

Running VACUUM recommended

The estimated rowcount on the table schema.table deviates significantly from the actual rowcount. You should run VACUUM ANALYZE on this table.

建议运行 VACUUM

表 schema.table 上的估计行数与实际行数显着不同。您应该在此表上运行 VACUUM ANALYZE。

I've tested it using pgAdmin 3 and Postgres 8.4.4, with autovacuum=off. The prompt shows up immediately whenever I click a table that has been changed.

我已经使用 pgAdmin 3 和 Postgres 8.4.4 对它进行了测试,autovacuum=off。每当我单击已更改的表时,提示都会立即显示。

Let's say I'm making a web-based system in Java, how do I detect if a table is out-of-date, so that I can show a prompt like the one in pgAdmin?

假设我正在用 Java 制作一个基于 Web 的系统,我如何检测表是否已过期,以便我可以显示类似于 pgAdmin 中的提示?

Because of the nature of my application, here are a few rules I have to follow:

由于我的应用程序的性质,这里有一些我必须遵循的规则:

  1. I want to know if the statistics of a certain table in pg_stats and pg_statistic are up to date.

  2. I can't set the autovacuum flag in postgresql.conf. (In other words, the autovacuum flag can be on or off. I have no control over it. I need to tell if the stats are up-to-date whether the autovacuum flag is on or off.)

  3. I can't run vacuum/analyze every time to make it up-to-date.

  4. When a user selects a table, I need to show the prompt that the table is outdated when there are any updates to this table (such as drop, insert, and update) that are not reflected in pg_stats and pg_statistic.

  1. 我想知道 pg_stats 和 pg_statistic 中某个表的统计信息是否是最新的。

  2. 我无法在 postgresql.conf 中设置 autovacuum 标志。(换句话说,autovacuum 标志可以打开或关闭。我无法控制它。我需要判断统计数据是否是最新的,autovacuum 标志是打开还是关闭。)

  3. 我无法每次都运行真空/分析以使其保持最新状态。

  4. 当用户选择一个表时,我需要在该表有任何更新(例如删除、插入和更新)而未反映在 pg_stats 和 pg_statistic 中时显示该表已过期的提示。

It seems that it's not feasible by analyzing timestamps in pg_catalog.pg_stat_all_tables. Of course, if a table hasn't been analyzed before, I can check if it has a timestamp in last_analyze to find out whether the table is up-to-date. Using this method, however, I can't detect if the table is up-to-date when there's already a timestamp. In other words, no matter how many rows I add to the table, its last_analyze timestamp in pg_stat_all_tables is always for the first analyze (assuming the autovacuum flag is off). Therefore, I can only show the "Running VACUUM recommended" prompt for the first time.

通过分析 pg_catalog.pg_stat_all_tables 中的时间戳似乎是不可行的。当然,如果一个表之前没有被分析过,我可以检查它在 last_analyze 中是否有时间戳,以确定该表是否是最新的。但是,使用这种方法,当已经有时间戳时,我无法检测表是否是最新的。换句话说,无论我向表中添加多少行,它在 pg_stat_all_tables 中的 last_analyze 时间戳始终用于第一次分析(假设 autovacuum 标志关闭)。因此,我只能在第一次显示“运行 VACUUM 推荐”提示。

It's also not feasible by comparing the last_analyze timestamp to the current timestamp. There might not be any updates to the table for days. And there might be tons of updates in one hour.

通过将 last_analyze 时间戳与当前时间戳进行比较也是不可行的。该表可能几天都没有任何更新。一小时内可能会有大量更新。

Given this scenario, how can I always tell if the statistics of a table are up-to-date?

鉴于这种情况,我如何才能始终判断表的统计信息是否是最新的?

回答by Sean

Check the system catalogs.

检查系统目录。

test=# SELECT schemaname, relname, last_analyze FROM pg_stat_all_tables WHERE relname = 'city';
 schemaname | relname |         last_analyze          
------------+---------+-------------------------------
 pagila     | city    | 2011-07-26 19:30:59.357898-07
 world      | city    | 2011-07-26 19:30:53.119366-07
(2 rows)

All kinds of useful information in there:

里面各种有用的信息:

test=# \d pg_stat_all_tables           View "pg_catalog.pg_stat_all_tables"
      Column       |           Type           | Modifiers 
-------------------+--------------------------+-----------
 relid             | oid                      | 
 schemaname        | name                     | 
 relname           | name                     | 
 seq_scan          | bigint                   | 
 seq_tup_read      | bigint                   | 
 idx_scan          | bigint                   | 
 idx_tup_fetch     | bigint                   | 
 n_tup_ins         | bigint                   | 
 n_tup_upd         | bigint                   | 
 n_tup_del         | bigint                   | 
 n_tup_hot_upd     | bigint                   | 
 n_live_tup        | bigint                   | 
 n_dead_tup        | bigint                   | 
 last_vacuum       | timestamp with time zone | 
 last_autovacuum   | timestamp with time zone | 
 last_analyze      | timestamp with time zone | 
 last_autoanalyze  | timestamp with time zone | 
 vacuum_count      | bigint                   | 
 autovacuum_count  | bigint                   | 
 analyze_count     | bigint                   | 
 autoanalyze_count | bigint                   |

回答by atrain

You should not have to worry about vac'ing in your application. Instead, you should have the autovacprocess configured on your server (in postgresql.conf), and the server takes takes of VACCUMand ANALYZEprocesses based on its own internal statistics. You can configure how often it should run, and what the threshold variables are for it to process.

您不必担心应用程序中的 vac'ing。相反,你应该有autovac你的服务器(在配置过程中postgresql.conf),并且服务器需要花费的VACCUMANALYZE进程对自己的内部统计数据。您可以配置它应该运行的频率,以及它要处理的阈值变量。