SQL 在 300 万行的 PostgreSQL 数据库上进行缓慢的简单更新查询

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3361291/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 06:59:36  来源:igfitidea点击:

Slow simple update query on PostgreSQL database with 3 million rows

sqlpostgresqlsql-update

提问by Ricardo

I am trying a simple UPDATE table SET column1 = 0on a table with ~3 million rows on Postegres 8.4 but it is taking forever to finish. It has been running for more than 10 min. now in my last attempt.

我正在UPDATE table SET column1 = 0Postegres 8.4 上的大约 300 万行的表上尝试一个简单的方法,但它需要很长时间才能完成。它已经运行了 10 多分钟。现在是我最后一次尝试。

Before, I tried to run a VACUUM and ANALYZE commands on that table and I also tried to create some indexes (although I doubt this will make any difference in this case) but none seems to help.

之前,我尝试在该表上运行 VACUUM 和 ANALYZE 命令,并且还尝试创建一些索引(尽管我怀疑这在这种情况下会产生任何影响)但似乎没有任何帮助。

Any other ideas?

还有其他想法吗?

Thanks, Ricardo

谢谢,里卡多

Update:

更新:

This is the table structure:

这是表结构:

CREATE TABLE myTable
(
  id bigserial NOT NULL,
  title text,
  description text,
  link text,
  "type" character varying(255),
  generalFreq real,
  generalWeight real,
  author_id bigint,
  status_id bigint,
  CONSTRAINT resources_pkey PRIMARY KEY (id),
  CONSTRAINT author_pkey FOREIGN KEY (author_id)
      REFERENCES users (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT c_unique_status_id UNIQUE (status_id)
);

I am trying to run UPDATE myTable SET generalFreq = 0;

我正在努力奔跑 UPDATE myTable SET generalFreq = 0;

采纳答案by Frank Heikens

Take a look at this answer: PostgreSQL slow on a large table with arrays and lots of updates

看看这个答案:PostgreSQL 在包含数组和大量更新的大表上很慢

First start with a better FILLFACTOR, do a VACUUM FULL to force table rewrite and check the HOT-updates after your UPDATE-query:

首先从更好的 FILLFACTOR 开始,执行 VACUUM FULL 以强制表重写并在 UPDATE 查询后检查 HOT 更新:

SELECT n_tup_hot_upd, * FROM pg_stat_user_tables WHERE relname = 'myTable';

HOT updates are much faster when you have a lot of records to update. More information about HOT can be found in this article.

当您有大量记录要更新时,HOT 更新会快得多。更多关于 HOT 的信息可以在这篇文章中找到。

Ps. You need version 8.3 or better.

附言。您需要 8.3 或更高版本。

回答by Le Droid

I have to update tables of 1 or 2 billion rows with various values for each rows. Each run makes ~100 millions changes (10%). My first try was to group them in transaction of 300K updates directly on a specific partition as Postgresql not always optimize prepared queries if you use partitions.

我必须用每行的不同值更新 1 或 20 亿行的表。每次运行都会进行约 1 亿次更改 (10%)。我的第一次尝试是将它们直接在特定分区上的 300K 更新事务中分组,因为如果您使用分区,Postgresql 并不总是优化准备好的查询。

  1. Transactions of bunch of "UPDATE myTable SET myField=value WHERE myId=id"
    Gives 1,500updates/sec. which means each run would take at least 18 hours.
  2. HOT updates solution as described here with FILLFACTOR=50. Gives 1,600 updates/sec. I uses SSD's so it's a costly improvement as it doubles the storage size.
  3. Insert in a temporary table of updated value and merge them after with UPDATE...FROM Gives 18,000updates/sec. if I do a VACUUM for each partition; 100,000 up/s otherwise. Cooool.
    Here is the sequence of operations:
  1. 一堆“UPDATE myTable SET myField=value WHERE myId=id”的交易
    提供1,500 个更新/秒。这意味着每次运行至少需要 18 小时。
  2. 热更新解决方案,如此处所述,FILLFACTOR=50。提供 1,600 次更新/秒。我使用 SSD,因此这是一项代价高昂的改进,因为它使存储大小增加了一倍。
  3. 插入一个更新值的临时表,然后用 UPDATE...FROM 合并它们,每秒更新18,000 次。如果我为每个分区做一个 VACUUM ;否则为 100,000 up/s。酷。
    以下是操作顺序:


CREATE TEMP TABLE tempTable (id BIGINT NOT NULL, field(s) to be updated,
CONSTRAINT tempTable_pkey PRIMARY KEY (id));

Accumulate a bunch of updates in a buffer depending of available RAM When it's filled, or need to change of table/partition, or completed:

根据可用 RAM 在缓冲区中累积一堆更新当它被填满时,或需要更改表/分区,或完成时:

COPY tempTable FROM buffer;
UPDATE myTable a SET field(s)=value(s) FROM tempTable b WHERE a.id=b.id;
COMMIT;
TRUNCATE TABLE tempTable;
VACUUM FULL ANALYZE myTable;

That means a run now takes 1.5h instead of 18h for 100 millions updates, vacuum included. To save time, it's not necessary to make a vacuum FULL at the end but even a fast regular vacuum is usefull to control your transaction ID on the database and not get unwanted autovacuum during rush hours.

这意味着一次运行现在需要 1.5 小时而不是 18 小时才能进行 1 亿次更新,包括真空。为了节省时间,没有必要在最后使真空全满,但即使是快速的定期真空也有助于控制数据库上的事务 ID,并且不会在高峰时段获得不需要的自动真空。

回答by Ricardo

After waiting 35 min. for my UPDATE query to finish (and still didn't) I decided to try something different. So what I did was a command:

等待 35 分钟后。为了我的 UPDATE 查询完成(但仍然没有),我决定尝试不同的方法。所以我所做的是一个命令:

CREATE TABLE table2 AS 
SELECT 
  all the fields of table1 except the one I wanted to update, 0 as theFieldToUpdate
from myTable

Then add indexes, then drop the old table and rename the new one to take its place. That took only 1.7 min. to process plus some extra time to recreate the indexes and constraints. But it did help! :)

然后添加索引,然后删除旧表并重命名新表以取代它。只用了 1.7 分钟。处理加上一些额外的时间来重新创建索引和约束。但它确实有帮助!:)

Of course that did work only because nobody else was using the database. I would need to lock the table first if this was in a production environment.

当然,这确实有效,因为没有其他人在使用该数据库。如果这是在生产环境中,我需要先锁定表。

回答by Tregoreg

Today I've spent many hours with similar issue. I've found a solution: drop all the constraints/indices before the update. No matter whether the column being updated is indexed or not, it seems like psql updates all the indices for all the updated rows. After the update is finished, add the constraints/indices back.

今天我花了很多时间来解决类似的问题。我找到了一个解决方案在更新之前删除所有约束/索引。无论被更新的列是否被索引,似乎 psql 都会更新所有更新行的所有索引。更新完成后,重新添加约束/索引。

回答by Fabiano Bonin

Try this (note that generalFreqstarts as type REAL, and stays the same):

试试这个(请注意,generalFreq以 REAL 类型开头,并保持不变):

ALTER TABLE myTable ALTER COLUMN generalFreq TYPE REAL USING 0;

This will rewrite the table, similar to a DROP + CREATE, and rebuild all indices. But all in one command. Much faster (about 2x) and you don't have to deal with dependencies and recreating indexes and other stuff, though it does lock the table (access exclusive--i.e. full lock) for the duration. Or maybe that's what you want if you want everything else to queue up behind it. If you aren't updating "too many" rows this way is slower than just an update.

这将重写表,类似于 DROP + CREATE,并重建所有索引。但一切都在一个命令中。快得多(大约 2 倍)并且您不必处理依赖项和重新创建索引和其他东西,尽管它确实在持续时间内锁定了表(访问独占 - 即完全锁定)。或者,如果您希望其他所有东西都排在它后面,这可能就是您想要的。如果您不更新“太多”行,则这种方式比仅更新要慢。

回答by rogerdpack

The first thing I'd suggest (from https://dba.stackexchange.com/questions/118178/does-updating-a-row-with-the-same-value-actually-update-the-row) is to only update rows that "need" it, ex:

我建议的第一件事(来自https://dba.stackexchange.com/questions/118178/does-updating-a-row-with-the-same-value-actually-update-the-row)只是更新“需要”它的行,例如:

 UPDATE myTable SET generalFreq = 0 where generalFreq != 0;

(might also need an index on generalFreq). Then you'll update fewer rows. Though not if the values are all non zero already, but updating fewer rows "can help" since otherwise it updates them and all indexes regardless of whether the value changed or not.

(可能还需要 generalFreq 上的索引)。然后您将更新更少的行。尽管如果值已经全部非零,则不是,但更新较少的行“可以帮助”,否则无论值是否更改,它都会更新它们和所有索引。

Another option: if the stars align in terms of defaults and not-null constraints, you can drop the old column and create anotherby just adjusting metadata, instant time.

另一种选择:如果星星在默认值和非空约束方面对齐,您可以删除旧列并通过调整元数据、即时时间来创建另一个列。

回答by Rolintocour

In my tests I noticed that a big update, more than 200 000 rows, is slower than 2 updates of 100 000 rows, even with a temporary table.

在我的测试中,我注意到超过 200 000 行的大更新比 100 000 行的 2 次更新慢,即使使用临时表也是如此。

My solution is to loop, in each loop create a temporary table of 200 000 rows, in this table I compute my values, then update my main table with the new values aso...

我的解决方案是循环,在每个循环中创建一个 200 000 行的临时表,在这个表中我计算我的值,然后用新值更新我的主表...

Every 2 000 000 rows, I manually "VACUUM ANALYSE mytable", I noticed that the auto vacuum doesn't do its job for such updates.

每 2 000 000 行,我手动“VACUUM ANALYZE mytable”,我注意到自动真空不会为此类更新完成其工作。

回答by Tom Gullen

How are you running it? If you are looping each row and performing an update statement, you are running potentially millions of individual updates which is why it will perform incredibly slowly.

你如何运行它?如果您循环每一行并执行更新语句,您可能会运行数百万个单独的更新,这就是它执行速度非常慢的原因。

If you are running a single update statement for all records in one statement it would run a lot faster, and if this process is slow then it's probably down to your hardware more than anything else. 3 million is a lot of records.

如果您在一个语句中为所有记录运行一个更新语句,它会运行得更快,如果这个过程很慢,那么它可能更取决于您的硬件。300万是很多记录。

回答by Chocolim

try

尝试

UPDATE myTable SET generalFreq = 0.0;

Maybe it is a casting issue

可能是铸造问题