如何在 PostgreSQL 中跟踪查询进度？

Question

提问by Mohammad Fajar

Is there a plugin or a script that can track the progress of long query in PostgreSQL?

是否有插件或脚本可以跟踪 PostgreSQL 中长查询的进度？

I mean I need to set progress bar value in Java that related to some update query in Postgres. I search over internet, but I just found some paper that not have any official implementation in any RDBMS system.

我的意思是我需要在 Java 中设置与 Postgres 中的某些更新查询相关的进度条值。我在互联网上搜索，但我刚刚发现了一些在任何 RDBMS 系统中都没有任何官方实现的论文。

Answer 1

回答by jjrv

I found a good answer here: Tracking progress of an update statement

我在这里找到了一个很好的答案：Tracking progress of an update statement

The trick is to first create a sequence (name it as you like):

诀窍是首先创建一个序列（随意命名）：

CREATE SEQUENCE query_progress START 1;

Then append to your query's WHERE part:

然后附加到您的查询的 WHERE 部分：

AND NEXTVAL('query_progress')!=0

Now you can query the progress:

现在您可以查询进度：

SELECT NEXTVAL('query_progress');

Finally don't forget to get rid of the sequence:

最后不要忘记去掉序列：

DROP SEQUENCE query_progress;

Note that this will most likely make your query run even slower and every time you check progress it will additionally increment the value. The above link suggested creating a temporary sequence but PostgreSQL doesn't seem to make them visible across sessions.

请注意，这很可能会使您的查询运行得更慢，并且每次检查进度时，它都会额外增加该值。上面的链接建议创建一个临时序列，但 PostgreSQL 似乎并没有让它们跨会话可见。

Answer 2

回答by Yunting Zhao

I have figured a way that might help. But further processing may be needed if you would like to implement it into your code like Java and etc.

我想出了一个可能有帮助的方法。但是，如果您想将其实现到您的代码（如 Java 等）中，则可能需要进一步处理。

The way is to examine the page content in order to track the progress.

方法是检查页面内容以跟踪进度。

Postgresql has a extension called pageinspect that can examine the page information of a particular table.

Postgresql 有一个名为 pageinspect 的扩展，可以检查特定表的页面信息。

Details here : https://www.postgresql.org/docs/current/pageinspect.html

详情请见：https: //www.postgresql.org/docs/current/pageinspect.html

Also spend some time on understanding postgresql's page layout here

也花一些时间在这里了解 postgresql 的页面布局

https://www.postgresql.org/docs/current/storage-page-layout.html

Look at xmin, xmax and ctid in particular

特别是查看 xmin、xmax 和 ctid

I am assuming the table the row insertion is following certain order. Like the table's pkey. And any long update will likely have new page appended.

我假设表中的行插入遵循特定顺序。就像桌子的钥匙一样。任何长时间的更新都可能会附加新页面。

I am also assuming that the primary key id is mostly continuous, with little to some gap. Since it is just an estimation, I think it is OK with this condition.

我还假设主键 id 大部分是连续的，几乎没有间隙。既然只是估计，我觉得这个条件还可以。

You cannotfind out the total page number by doing SELECT relname, relpages FROM pg_classthough, since it is not updated.

但是，您无法通过这样做找出总页码SELECT relname, relpages FROM pg_class，因为它没有更新。

You will hit with an exception if page index is not existed in the strage ( but you will find the page, even if it is not updated in pg_class or so) , so make a little "binary search" on the "page_index" to find the largest page you have. Don't need to be exact.

如果页面索引在 strage 中不存在，您将遇到异常（但您会找到该页面，即使它没有在 pg_class 左右更新），因此在“page_index”上进行一些“二进制搜索”以找到您拥有的最大页面。不需要很准确。

Use

利用

SELECT backend_xid FROM pg_stat_activity WHERE pid = process-id

To find your current transcation id.

查找您当前的交易 ID。

Use

利用

SELECT lp,t_xmin,t_xmax,t_ctid,t_bits,t_data FROM heap_page_items(get_raw_page('relation_name', page_index));

In the sample I am working on it may looks like this

在我正在处理的示例中，它可能看起来像这样

SELECT lp,t_xmin,t_xmax,t_ctid,t_bits,t_data FROM heap_page_items(get_raw_page('foo', 3407000));
lp | t_xmin | t_xmax | t_ctid | t_bits | t_data
1 | 592744 | 592744 | (3407000,1) | 110000000111000000000000 | \xd1100000000000000e4400000000000054010000611b0000631b0000
2 | 592744 | 592744 | (3407000,2) | 110000000111000000000000 | \xd110000000000000104400000000000040010000611b0000631b0000
3 | 592744 | 592744 | (3407000,3) | 110000000111000000000000 | \xd11000000000000011440000000000007c010000611b0000631b0000

SELECT lp,t_xmin,t_xmax,t_ctid,t_bits,t_data FROM heap_page_items(get_raw_page('foo', 3407000));
LP | t_xmin | t_xmax | t_ctid | t_bits | t_data
1 | 592744 | 592744 | (3407000,1) | 110000000111000000000000 | \xd1100000000000000e4400000000000054010000611b0000631b0000
2 | 592744 | 592744 | (3407000,2) | 110000000111000000000000 | \xd110000000000000104400000000000040010000611b0000631b0000
3 | 592744 | 592744 | (3407000,3) | 110000000111000000000000 | \xd11000000000000011440000000000007c010000611b0000631b0000

t_data is the data. lp is the tuple index from the item list. t_xmin and t_xmax is the transcation id. And the t_ctid is the point to the tuple within the tuple itself. t_bits is the NULL bitmap if you have null value in your tuple.

t_data 是数据。lp 是项目列表中的元组索引。t_xmin 和 t_xmax 是交易 ID。并且 t_ctid 是元组本身中元组的点。如果元组中有空值，则 t_bits 是 NULL 位图。

First check to see if t_min = t_max, and t_ctid (page_index, tuple_id) and lp is the same. If so, check if the t_xmin is the same as your transcation id. If so check data.

首先查看t_min = t_max，和t_ctid(page_index, tuple_id)和lp是否相同。如果是，请检查 t_xmin 是否与您的交易 ID 相同。如果是这样检查数据。

Be aware of Endian-ness and NULL bitmap. In my case, it is big-endian (LSB first).

注意 Endian-ness 和 NULL 位图。就我而言，它是大端（LSB 优先）。

In my example, the first row is valid. And the first BIGINT (8 bytes 16 hex number) is the sorted id I am looking. So on first row the data is

在我的示例中，第一行是有效的。第一个 BIGINT（8 个字节 16 个十六进制数）是我正在查找的排序 ID。所以第一行的数据是

\xd110000000000000

Which translate to 0x101d (check endian-ness) --> 4305

转换为 0x101d（检查字节序）--> 4305

And I know my largest id is 18209 and smallest_id is 2857. And I seperate the job into 8 parts so

我知道我最大的 id 是 18209，smallest_id 是 2857。我把工作分成 8 部分，所以

(18209 - 2857) / 8 = 1919
And this is the first part I ran. so
2857 + 1919 = 4776

(18209 - 2857) / 8 = 1919
这是我运行的第一部分。所以
2857 + 1919 = 4776

This means that my sub-job starts at 2857 id and currently at 4305. If it hits 4776, this thread is done!

这意味着我的子作业从 2857 id 开始，目前在 4305。如果它达到 4776，这个线程就完成了！

This is

这是

(4305 - 2857)/ 1919 = 75.5% Done

(4305 - 2857)/ 1919 = 75.5% 完成

Limitations

限制

This will not work with hash value update. In my case, the id happen to order sequentially as the pkey. And the planner trigger a sequential read. This should also work if the planner is doing some sort of btree index scan for update.

这不适用于哈希值更新。在我的情况下，id 恰好作为 pkey 顺序排序。规划器触发顺序读取。如果规划器正在执行某种 btree 索引扫描以进行更新，这也应该有效。

Look into CLUSTERif you are interested in ordering the physical rows in index order.

如果您有兴趣按索引顺序对物理行进行排序，请查看CLUSTER。

Again this method is not exact. And with the assumption highlighted above. If used in a program, should use sparsely to prevent extra overhead for the disk I/O

同样，这种方法并不准确。并根据上面强调的假设。如果在程序中使用，应该使用 sparsely 以防止磁盘 I/O 的额外开销

Answer 3

回答by Richard Huxton

No. There is no way to track the "live" progress of a query. In theory, the system could compare top-level progress versus the query-plan, and emit some sort of percentage readout. In practice, I doubt it would be terribly accurate and I doubt the performance impact would be worthwhile.

不可以。无法跟踪查询的“实时”进度。理论上，系统可以将顶级进度与查询计划进行比较，并发出某种百分比读数。在实践中，我怀疑它会非常准确，我怀疑性能影响是否值得。

Answer 4

回答by Lajos Arpad

You can add an update_timecolumn to your table, holding the value of the last update. If you know somehow which records should be affected, then you can also set their update_timeto the current time and when you check the progress and you know the number of affected rows, then you can select the number of records affected where the update_timeis newer than the time when you started the update. Number of affected rows having "new" update_time/ number of records to update * 100 gives you the progress percent.

您可以update_time向表中添加一列，保存上次更新的值。如果您以某种方式知道应该影响哪些记录，那么您还可以将它们设置update_time为当前时间，当您检查进度并且知道受影响的行数时，您可以选择受影响的记录数，其中update_time比开始更新的时间。具有“新”的受影响行update_time数/要更新的记录数* 100 为您提供进度百分比。

如何在 PostgreSQL 中跟踪查询进度？

提问by Mohammad Fajar

回答by jjrv

回答by Yunting Zhao

回答by Richard Huxton

回答by Lajos Arpad

相关推荐

最近更新

标签

如何在 PostgreSQL 中跟踪查询进度？

提问by Mohammad Fajar

回答by jjrv

回答by Yunting Zhao

回答by Richard Huxton

回答by Lajos Arpad

相关推荐

postgresql 在 row_to_json 函数中选择查询

postgresql 使用python在postgres中复制（来自）带有标题的csv

postgresql 运算符不存在：integer = integer[] in a query with ANY

postgresql 休眠 UUID 作为 UUID 类型

相关推荐

最近更新

标签