postgresql 大型数据集的 CASE 与多个 UPDATE 查询 - 性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9875145/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-20 23:46:14  来源:igfitidea点击:

CASE vs Multiple UPDATE queries for large data sets - Performance

sqlperformancepostgresqlcasesql-update

提问by Phill Pafford

For performance what option would be better for large data sets that are to be updated?

对于性能而言,对于要更新的​​大型数据集,哪种选择会更好?

Using a CASE statement or Individual update queries?

使用 CASE 语句或个人更新查询?

CASE Example:

案例示例:

UPDATE tbl_name SET field_name = 
CASE
    WHEN condition_1 THEN 'Blah'
    WHEN condition_2 THEN 'Foo'
    WHEN condition_x THEN 123
    ELSE 'bar'
END AS value

Individual Query Example:

个人查询示例:

UPDATE tbl_name SET field_name = 'Blah' WHERE field_name = condition_1
UPDATE tbl_name SET field_name = 'Foo' WHERE field_name = condition_2
UPDATE tbl_name SET field_name = 123 WHERE field_name = condition_x
UPDATE tbl_name SET field_name = 'bar' WHERE field_name = condition_y

NOTE: About 300,000 records are going to be updated and the CASE statement would have about 10,000 WHEN conditions. If using the individual queries it's about 10,000 as well

注意:大约有 300,000 条记录将被更新,CASE 语句将有大约 10,000 个 WHEN 条件。如果使用单个查询,则大约为 10,000

回答by JNK

The CASEversion.

CASE版本。

This is because there is a good chance you are altering the same row more than once with the individual statements. If row 10 has both condition_1and condition_ythen it will need to get read and altered twice. If you have a clustered index this means two clustered index updates on top of whatever the other field(s) that were modified were.

这是因为您很有可能使用单个语句多次更改同一行。如果第 10 行同时具有condition_1condition_y则需要读取和更改两次。如果您有聚集索引,这意味着在修改的其他字段之上有两个聚集索引更新。

If you can do it as a single statement, each row will be read only once and it should run much quicker.

如果您可以将其作为单个语句执行,则每一行将仅读取一次,并且运行速度会快得多。

I changed a similar process about a year ago that used dozens of UPDATEstatements in sequence to use a since UPDATEwith CASEand processing time dropped about 80%.

大约一年前,我更改了一个类似的流程,该流程UPDATE依次使用了数十个语句,使用了UPDATEwithCASE并且处理时间下降了约 80%。

回答by Diego

It seems logic to me that on the first option SQL Server will go through the table only once and for each row, it will evaluate the condition.

在我看来,在第一个选项中,SQL Server 只会遍历表一次,并且对于每一行,它都会评估条件,这似乎是合乎逻辑的。

On the second, it will have to go through all table 4 times

第二次,它必须遍历所有表 4 次

So, for a table with 1000 rows, on the first option on the best case scenario we are talking about 1000 evaluations and worst case, 3000. On the second we'll always have 4000 evaluations

因此,对于具有 1000 行的表,在最佳情况下的第一个选项中,我们讨论的是 1000 次评估,最坏情况下为 3000。在第二个选项中,我们将始终有 4000 次评估

So option 1 would be the faster.

所以选项1会更快。

回答by Seph

As pointed out by Mitch, try making a temp table filling it with all the data you need, make a different temp table for each column (field) you want to change. You should also add an index to the temp table(s) for added performance improvement.

正如 Mitch 所指出的,尝试制作一个临时表,用您需要的所有数据填充它,为您要更改的每一列(字段)制作一个不同的临时表。您还应该向临时表添加索引以提高性能。

This way your update statement becomes (more or less):

这样你的更新语句变成(或多或少):

UPDATE tbl_name SET field_name = COALESCE((SELECT value FROM temp_tbl WHERE tbl_name.conditional_field = temp_tbl.condition_value), field_name),
    field_name2 = COALESCE((SELECT value FROM temp_tbl2 WHERE tbl_name.conditional_field2 = temp_tbl2.condition_value), field_name2)

and so on..

等等..

This should give you good performance while scaling up for large volumes of updates at once.

这应该为您提供良好的性能,同时一次扩展大量更新。