postgresql 大型数据集的 CASE 与多个 UPDATE 查询 - 性能

Question

提问by Phill Pafford

For performance what option would be better for large data sets that are to be updated?

对于性能而言，对于要更新的大型数据集，哪种选择会更好？

Using a CASE statement or Individual update queries?

使用 CASE 语句或个人更新查询？

CASE Example:

案例示例：

UPDATE tbl_name SET field_name = 
CASE
    WHEN condition_1 THEN 'Blah'
    WHEN condition_2 THEN 'Foo'
    WHEN condition_x THEN 123
    ELSE 'bar'
END AS value

Individual Query Example:

个人查询示例：

UPDATE tbl_name SET field_name = 'Blah' WHERE field_name = condition_1
UPDATE tbl_name SET field_name = 'Foo' WHERE field_name = condition_2
UPDATE tbl_name SET field_name = 123 WHERE field_name = condition_x
UPDATE tbl_name SET field_name = 'bar' WHERE field_name = condition_y

NOTE: About 300,000 records are going to be updated and the CASE statement would have about 10,000 WHEN conditions. If using the individual queries it's about 10,000 as well

注意：大约有 300,000 条记录将被更新，CASE 语句将有大约 10,000 个 WHEN 条件。如果使用单个查询，则大约为 10,000

Answer 1

回答by JNK

The CASEversion.

该CASE版本。

This is because there is a good chance you are altering the same row more than once with the individual statements. If row 10 has both condition_1and condition_ythen it will need to get read and altered twice. If you have a clustered index this means two clustered index updates on top of whatever the other field(s) that were modified were.

这是因为您很有可能使用单个语句多次更改同一行。如果第 10 行同时具有condition_1，condition_y则需要读取和更改两次。如果您有聚集索引，这意味着在修改的其他字段之上有两个聚集索引更新。

If you can do it as a single statement, each row will be read only once and it should run much quicker.

如果您可以将其作为单个语句执行，则每一行将仅读取一次，并且运行速度会快得多。

I changed a similar process about a year ago that used dozens of UPDATEstatements in sequence to use a since UPDATEwith CASEand processing time dropped about 80%.

大约一年前，我更改了一个类似的流程，该流程UPDATE依次使用了数十个语句，使用了UPDATEwithCASE并且处理时间下降了约 80%。

Answer 2

回答by Diego

It seems logic to me that on the first option SQL Server will go through the table only once and for each row, it will evaluate the condition.

在我看来，在第一个选项中，SQL Server 只会遍历表一次，并且对于每一行，它都会评估条件，这似乎是合乎逻辑的。

On the second, it will have to go through all table 4 times

第二次，它必须遍历所有表 4 次

So, for a table with 1000 rows, on the first option on the best case scenario we are talking about 1000 evaluations and worst case, 3000. On the second we'll always have 4000 evaluations

因此，对于具有 1000 行的表，在最佳情况下的第一个选项中，我们讨论的是 1000 次评估，最坏情况下为 3000。在第二个选项中，我们将始终有 4000 次评估

So option 1 would be the faster.

所以选项1会更快。

Answer 3

回答by Seph

As pointed out by Mitch, try making a temp table filling it with all the data you need, make a different temp table for each column (field) you want to change. You should also add an index to the temp table(s) for added performance improvement.

正如 Mitch 所指出的，尝试制作一个临时表，用您需要的所有数据填充它，为您要更改的每一列（字段）制作一个不同的临时表。您还应该向临时表添加索引以提高性能。

This way your update statement becomes (more or less):

这样你的更新语句变成（或多或少）：

UPDATE tbl_name SET field_name = COALESCE((SELECT value FROM temp_tbl WHERE tbl_name.conditional_field = temp_tbl.condition_value), field_name),
    field_name2 = COALESCE((SELECT value FROM temp_tbl2 WHERE tbl_name.conditional_field2 = temp_tbl2.condition_value), field_name2)

and so on..

等等..

This should give you good performance while scaling up for large volumes of updates at once.

这应该为您提供良好的性能，同时一次扩展大量更新。

postgresql 大型数据集的 CASE 与多个 UPDATE 查询 - 性能

提问by Phill Pafford

回答by JNK

回答by Diego

回答by Seph

相关推荐

最近更新

标签

postgresql 大型数据集的 CASE 与多个 UPDATE 查询 - 性能

提问by Phill Pafford

回答by JNK

回答by Diego

回答by Seph

相关推荐

postgresql 运算符不存在：text = bigint

PostgreSQL 查询未在生产中使用索引

postgresql SQL - 从列的文本中删除子字符串

postgresql 如何找出数据何时插入到 Postgres？

相关推荐

最近更新

标签