postgresql 如何确保物化视图始终是最新的？

Question

提问by John Bachir

I'll need to invoke REFRESH MATERIALIZED VIEWon each change to the tables involved, right? I'm surprised to not find much discussion of this on the web.

我需要对REFRESH MATERIALIZED VIEW所涉及的表格进行每次更改，对吗？我很惊讶在网上没有找到太多关于这个的讨论。

How should I go about doing this?

我该怎么做呢？

I think the top half of the answer here is what I'm looking for: https://stackoverflow.com/a/23963969/168143

我认为这里答案的上半部分是我正在寻找的：https: //stackoverflow.com/a/23963969/168143

Are there any dangers to this? If updating the view fails, will the transaction on the invoking update, insert, etc. be rolled back? (this is what I want... I think)

这有什么危险吗？如果更新视图失败，调用更新、插入等的事务会回滚吗？（这就是我想要的……我想）

Answer 1

回答by MatheusOl

I'll need to invoke REFRESH MATERIALIZED VIEWon each change to the tables involved, right?

我需要对REFRESH MATERIALIZED VIEW所涉及的表格进行每次更改，对吗？

Yes, PostgreSQL by itself will never call it automatically, you need to do it some way.

是的，PostgreSQL 本身永远不会自动调用它，您需要以某种方式调用它。

How should I go about doing this?

我该怎么做呢？

Many ways to achieve this. Before giving some examples, keep in mind that REFRESH MATERIALIZED VIEWcommanddoes block the view in AccessExclusive mode, so while it is working, you can't even do SELECTon the table.

实现这一目标的方法很多。在举一些例子之前，请记住该REFRESH MATERIALIZED VIEW命令在 AccessExclusive 模式下确实会阻止视图，因此在它工作时，您甚至不能SELECT在桌子上做。

Although, if you are in version 9.4 or newer, you can give it the CONCURRENTLYoption:

但是，如果您使用的是 9.4 或更高版本，则可以为其提供以下CONCURRENTLY选项：

REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv;

This will acquire an ExclusiveLock, and will not block SELECTqueries, but may have a bigger overhead (depends on the amount of data changed, if few rows have changed, then it might be faster). Although you still can't run two REFRESHcommands concurrently.

这将获得一个 ExclusiveLock，并且不会阻塞SELECT查询，但可能会有更大的开销（取决于更改的数据量，如果更改的行很少，那么它可能会更快）。尽管您仍然无法同时运行两个REFRESH命令。

Refresh manually

手动刷新

It is an option to consider. Specially in cases of data loading or batch updates (e.g. a system that only loads tons of information/data after long periods of time) it is common to have operations at end to modify or process the data, so you can simple include a REFRESHoperation in the end of it.

这是一个可以考虑的选项。特别是在数据加载或批量更新的情况下（例如，一个系统在很长一段时间后才加载大量信息/数据），通常会在末尾进行操作来修改或处理数据，因此您可以简单地将REFRESH操作包含在它的结束。

Scheduling the REFRESH operation

调度 REFRESH 操作

The first and widely used option is to use some scheduling system to invoke the refresh, for instance, you could configure the like in a cron job:

第一个也是广泛使用的选项是使用一些调度系统来调用刷新，例如，您可以在 cron 作业中配置类似的内容：

*/30 * * * * psql -d your_database -c "REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv"

And then your materialized view will be refreshed at each 30 minutes.

然后您的物化视图将每 30 分钟刷新一次。

Considerations

注意事项

This option is really good, specially with CONCURRENTLYoption, but only if you can accept the data not being 100% up to date all the time. Keep in mind, that even with or without CONCURRENTLY, the REFRESHcommand does need to run the entire query, so you have to take the time needed to run the inner query before considering the time to schedule the REFRESH.

这个选项真的很好，特别是有CONCURRENTLY选项，但前提是你可以接受数据不是 100% 一直是最新的。请记住，即使有或没有CONCURRENTLY，该REFRESH命令确实需要运行整个查询，因此在考虑安排REFRESH.

Refreshing with a trigger

用触发器刷新

Another option is to call the REFRESH MATERIALIZED VIEWin a trigger function, like this:

另一种选择是REFRESH MATERIALIZED VIEW在触发器函数中调用，如下所示：

CREATE OR REPLACE FUNCTION tg_refresh_my_mv()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv;
    RETURN NULL;
END;
$$;

Then, in any table that involves changes on the view, you do:

然后，在涉及视图更改的任何表中，您执行以下操作：

CREATE TRIGGER tg_refresh_my_mv AFTER INSERT OR UPDATE OR DELETE
ON table_name
FOR EACH STATEMENT EXECUTE PROCEDURE tg_refresh_my_mv();

Considerations

注意事项

It has some critical pitfalls for performance and concurrency:

它在性能和并发性方面存在一些关键缺陷：

Any INSERT/UPDATE/DELETE operation will have to execute the query (which is possible slow if you are considering MV);
Even with CONCURRENTLY, one REFRESHstill blocks another one, so any INSERT/UPDATE/DELETE on the involved tables will be serialized.

任何 INSERT/UPDATE/DELETE 操作都必须执行查询（如果您正在考虑 MV，这可能会很慢）；
即使使用CONCURRENTLY，一个REFRESH仍然会阻塞另一个，因此涉及的表上的任何 INSERT/UPDATE/DELETE 都将被序列化。

The only situation I can think that as a good idea is if the changes are really rare.

我能想到的唯一一种好主意是，如果这些变化真的很少见。

Refresh using LISTEN/NOTIFY

使用 LISTEN/NOTIFY 刷新

The problem with the previous option is that it is synchronous and impose a big overhead at each operation. To ameliorate that, you can use a trigger like before, but that only calls a NOTIFYoperation:

前一个选项的问题在于它是同步的，并且在每个操作中都会产生很大的开销。为了改善这种情况，您可以像以前一样使用触发器，但它只会调用一个NOTIFY操作：

CREATE OR REPLACE FUNCTION tg_refresh_my_mv()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    NOTIFY refresh_mv, 'my_mv';
    RETURN NULL;
END;
$$;

So then you can build an application that keep connected and uses LISTENoperationto identify the need to call REFRESH. One nice project that you can use to test this is pgsidekick, with this project you can use shell script to do LISTEN, so you can schedule the REFRESHas:

因此，您可以构建一个保持连接的应用程序并使用LISTEN操作来识别调用REFRESH. 一个可以用来测试的不错的项目是pgsidekick，在这个项目中你可以使用 shell 脚本来做LISTEN，所以你可以安排REFRESH如下：

pglisten --listen=refresh_mv --print0 | xargs -0 -n1 -I? psql -d your_database -c "REFRESH MATERIALIZED VIEW CONCURRENTLY ?;"

Or use pglater(also inside pgsidekick) to make sure you don't call REFRESHvery often. For example, you can use the following trigger to make it REFRESH, but within 1 minute (60 seconds):

或者使用pglater（也在 inside pgsidekick）来确保你不REFRESH经常打电话。例如，您可以使用以下触发器来实现REFRESH，但在 1 分钟（60 秒）内：

CREATE OR REPLACE FUNCTION tg_refresh_my_mv()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    NOTIFY refresh_mv, '60 REFRESH MATERIALIZED VIEW CONCURRENLTY my_mv';
    RETURN NULL;
END;
$$;

So it will not call REFRESHin less the 60 seconds apart, and also if you NOTIFYmany times in less than 60 seconds, the REFRESHwill be triggered only once.

所以它不会REFRESH在少于 60 秒的时间内调用，而且如果你NOTIFY在少于 60 秒内多次调用，REFRESH只会触发一次。

Considerations

注意事项

As the cron option, this one also is good only if you can bare with a little stale data, but this has the advantage that the REFRESHis called only when really needed, so you have less overhead, and also the data is updated more closer to when needed.

作为 cron 选项，只有当您可以保留一些陈旧数据时，这个选项也很好，但是这样做的优点REFRESH是仅在真正需要时才调用它，因此您的开销较小，并且数据更新得更接近需要的时候。

OBS: I haven't really tried the codes and examples yet, so if someone finds a mistake, typo or tries it and works (or not), please let me know.

OBS：我还没有真正尝试过代码和示例，所以如果有人发现错误、打字错误或尝试并成功（或失败），请告诉我。

Answer 2

回答by Park JongBum

Let me point out three things on the previous answer by MatheusOl - the pglater technology.

让我在 MatheusOl 之前的回答中指出三件事 - pglater 技术。

As the last element of long_options array it should include "{0, 0, 0, 0}" element as pointed at https://linux.die.net/man/3/getopt_longby the phrase "The last element of the array has to be filled with zeros." So, it should read -
```
static struct option long_options[] =     {
      //......
      {"help", no_argument, NULL, '?'},
      {0, 0, 0, 0} 
};
```
On the malloc/free thing -- one free(for char listen = malloc(...);) is missing. Anyhow, malloc caused pglater process to crash on CentOS (but not on Ubuntu - I don't know why). So, I recommend using char array and assign the array name to the char pointer(to both charand char**). You many need to force type conversion while you do that(pointer assignment).
```
char block4[100];
...
password_prompt = block4;
...
char block1[500];
const char **keywords = (const char **)&block1;
...
char block3[300];
char *listen = block3;
sprintf(listen, "listen %s", id);
PQfreemem(id);
res = PQexec(db, listen);
```
Use below table to calculate timeout where md is mature_duration which is the time difference between the latest refresh(lr) time point and current time.
when md >= callback_delay(cd) ==> timeout: 0
when md + PING_INTERVAL >= cd ==> timeout: cd-md[=cd-(now-lr)]
when md + PING_INTERVAL < cd ==> timeout: PI

作为 long_options 数组的最后一个元素，它应该包含“{0, 0, 0, 0}”元素，如https://linux.die.net/man/3/getopt_long 中的短语“数组的最后一个元素必须用零填充。” 所以，它应该读-
```
static struct option long_options[] =     {
      //......
      {"help", no_argument, NULL, '?'},
      {0, 0, 0, 0} 
};
```
关于 malloc/free 的事情——一个 free(for char listen = malloc(...);) 丢失了。无论如何， malloc 导致 pglater 进程在 CentOS 上崩溃（但不是在 Ubuntu 上 - 我不知道为什么）。因此，我建议使用 char 数组并将数组名称分配给 char 指针（同时指向 char和 char**）。您在执行此操作时需要强制进行类型转换（指针分配）。
```
char block4[100];
...
password_prompt = block4;
...
char block1[500];
const char **keywords = (const char **)&block1;
...
char block3[300];
char *listen = block3;
sprintf(listen, "listen %s", id);
PQfreemem(id);
res = PQexec(db, listen);
```
使用下表计算超时，其中 md 是成熟的持续时间，即最新的 refresh(lr) 时间点与当前时间之间的时间差。
当 md >= callback_delay(cd) ==> 超时：0
当 md + PING_INTERVAL >= cd ==> 超时：cd-md[=cd-(now-lr)]
当 md + PING_INTERVAL < cd ==> 超时：PI

To implement this algorithm(3rd point), you should init 'lr' as follows -

要实现此算法（第三点），您应该按如下方式初始化 'lr' -

res = PQexec(db, command);
latest_refresh = time(0);
if (PQresultStatus(res) == PGRES_COMMAND_OK) {

res = PQexec(db, command);
latest_refresh = time(0);
if (PQresultStatus(res) == PGRES_COMMAND_OK) {

postgresql 如何确保物化视图始终是最新的？

提问by John Bachir

回答by MatheusOl

Refresh manually

手动刷新

Scheduling the REFRESH operation

调度 REFRESH 操作

Considerations

注意事项

Refreshing with a trigger

用触发器刷新

Considerations

注意事项

Refresh using LISTEN/NOTIFY

使用 LISTEN/NOTIFY 刷新

Considerations

注意事项

回答by Park JongBum

相关推荐

最近更新

标签

postgresql 如何确保物化视图始终是最新的？

提问by John Bachir

回答by MatheusOl

Refresh manually

手动刷新

Scheduling the REFRESH operation

调度 REFRESH 操作

Considerations

注意事项

Refreshing with a trigger

用触发器刷新

Considerations

注意事项

Refresh using LISTEN/NOTIFY

使用 LISTEN/NOTIFY 刷新

Considerations

注意事项

回答by Park JongBum

相关推荐

如何通过命令行启动 Windows 调度程序？

PostgreSQL 错误致命：角色“用户名”不存在

如何在 Windows Vista 上禁用“调试/关闭应用程序”对话框？

获取错误：在部署到 heroku 时，您必须至少安装一个 postgresql-client-<version> 包

相关推荐

最近更新

标签