postgresql 大表上的 Postgres 更新非常慢

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9650290/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-20 23:43:19  来源:igfitidea点击:

Very slow Postgres UPDATE on large table

postgresqlsql-update

提问by Aren Cambre

I have a Postgres 9.1.3 table with 2.06 million rows after WHERE Y=1as per below (it only has a few ten thousand more rows total without any WHERE). I am trying to add data to an empty field with a query like this:

我有一个 Postgres 9.1.3 表,在WHERE Y=1之后有 206 万行,如下所示(它总共只有几万行,没有任何WHERE)。我正在尝试使用如下查询将数据添加到空字段:

WITH B AS (
    SELECT Z,
           rank() OVER (ORDER BY L, N, M, P) AS X
    FROM   A
    WHERE  Y=1
)

UPDATE A
SET A.X = B.X
FROM B
WHERE A.Y=1
  AND B.Z = A.Z;

This query runs for hours and appears to progress very slowly. In fact, the second time I tried this, I had a power outage after the query ran for ~3 hours. After restoring power, I analyzed the table and got this:

此查询运行数小时,并且进展非常缓慢。事实上,我第二次尝试这个时,在查询运行了大约 3 个小时后就断电了。恢复供电后,我分析了表格并得到了这个:

INFO:  analyzing "consistent.master"
INFO:  "master": scanned 30000 of 69354 pages, containing 903542 live rows and 153552 dead rows; 30000 rows in sample, 2294502 estimated total rows
Total query runtime: 60089 ms.

Is it correct to interpret that the query had barely progressed in those hours?

将查询在那些小时内几乎没有进展的解释是否正确?

I have done a VACUUM FULLand ANALYZEbefore running the long query.

在运行长查询之前,我已经完成了VACUUM FULLANALYZE

The query within the WITHonly takes 40 seconds.

WITH 中的查询只需要 40 秒。

All fields referenced above except A.X, and by extension B.X, are indexed: L, M, N, P, Y, Z.

除了 AX 和扩展 BX 之外,上面引用的所有字段都被索引:L、M、N、P、Y、Z。

This is being run on a laptop with 8 GB RAM, a Core i7 Q720 1.6 GHz quad core processor, and Windows 7 x64. I am running Postgres 32 bit for compatibility with PostGIS 1.5.3. 64 bit PostGIS for Windows isn't available yet. (32 bit Postgres means it can't use more than 2 GB RAM in Windows, but I doubt that's an issue here.)

这是在具有 8 GB RAM、Core i7 Q720 1.6 GHz 四核处理器和 Windows 7 x64 的笔记本电脑上运行的。我正在运行 Postgres 32 位以与 PostGIS 1.5.3 兼容。64 位 PostGIS for Windows 尚不可用。(32 位 Postgres 意味着它在 Windows 中不能使用超过 2 GB 的 RAM,但我怀疑这是一个问题。)

Here's the result of EXPLAIN:

这是 EXPLAIN 的结果:

Update on A  (cost=727684.76..945437.01 rows=2032987 width=330)
  CTE B
    ->  WindowAgg  (cost=491007.50..542482.47 rows=2058999 width=43)
          ->  Sort  (cost=491007.50..496155.00 rows=2058999 width=43)
                Sort Key: A.L, A.N, A.M, A.P
                ->  Seq Scan on A  (cost=0.00..85066.80 rows=2058999 width=43)
                      Filter: (Y = 1)
  ->  Hash Join  (cost=185202.29..402954.54 rows=2032987 width=330)
        Hash Cond: ((B.Z)::text = (A.Z)::text)
        ->  CTE Scan on B  (cost=0.00..41179.98 rows=2058999 width=88)
        ->  Hash  (cost=85066.80..85066.80 rows=2058999 width=266)
              ->  Seq Scan on A  (cost=0.00..85066.80 rows=2058999 width=266)
                    Filter: (Y = 1)

回答by maniek

There could be multiple solutions.

可能有多种解决方案。

  • The update could be blocked on a lock. Consult pg_locks view.
  • Maybe there are triggers on A? They could be the reason for slowdown.
  • Try "explain update... " - is the plan significantly different than the plan of plain select? Maybe You could do it in 2 steps - export "B" to a table, and update from that table.
  • Try dropping the indexes before the update.
  • Create a new table, drop the old one, rename the new table to old table's name.
  • 更新可能会被锁定阻止。请参阅 pg_locks 视图。
  • 也许A上有触发器?它们可能是放缓的原因。
  • 尝试“解释更新...” - 该计划与普通选择计划有显着不同吗?也许您可以分两步完成 - 将“B”导出到表,然后从该表进行更新。
  • 尝试在更新前删除索引。
  • 创建一个新表,删除旧表,将新表重命名为旧表的名称。

回答by dpetruha

Try to rewrite the query like this:

尝试像这样重写查询:

UPDATE A
SET A.X = B.X
FROM B
WHERE A.Y=1
      AND B.Z = A.Z
      AND A.X IS DISTINCT FROM B.X;