postgresql 大型数据集的 CASE 与多个 UPDATE 查询 - 性能
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9875145/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
CASE vs Multiple UPDATE queries for large data sets - Performance
提问by Phill Pafford
For performance what option would be better for large data sets that are to be updated?
对于性能而言,对于要更新的大型数据集,哪种选择会更好?
Using a CASE statement or Individual update queries?
使用 CASE 语句或个人更新查询?
CASE Example:
案例示例:
UPDATE tbl_name SET field_name =
CASE
WHEN condition_1 THEN 'Blah'
WHEN condition_2 THEN 'Foo'
WHEN condition_x THEN 123
ELSE 'bar'
END AS value
Individual Query Example:
个人查询示例:
UPDATE tbl_name SET field_name = 'Blah' WHERE field_name = condition_1
UPDATE tbl_name SET field_name = 'Foo' WHERE field_name = condition_2
UPDATE tbl_name SET field_name = 123 WHERE field_name = condition_x
UPDATE tbl_name SET field_name = 'bar' WHERE field_name = condition_y
NOTE: About 300,000 records are going to be updated and the CASE statement would have about 10,000 WHEN conditions. If using the individual queries it's about 10,000 as well
注意:大约有 300,000 条记录将被更新,CASE 语句将有大约 10,000 个 WHEN 条件。如果使用单个查询,则大约为 10,000
回答by JNK
The CASE
version.
该CASE
版本。
This is because there is a good chance you are altering the same row more than once with the individual statements. If row 10 has both condition_1
and condition_y
then it will need to get read and altered twice. If you have a clustered index this means two clustered index updates on top of whatever the other field(s) that were modified were.
这是因为您很有可能使用单个语句多次更改同一行。如果第 10 行同时具有condition_1
,condition_y
则需要读取和更改两次。如果您有聚集索引,这意味着在修改的其他字段之上有两个聚集索引更新。
If you can do it as a single statement, each row will be read only once and it should run much quicker.
如果您可以将其作为单个语句执行,则每一行将仅读取一次,并且运行速度会快得多。
I changed a similar process about a year ago that used dozens of UPDATE
statements in sequence to use a since UPDATE
with CASE
and processing time dropped about 80%.
大约一年前,我更改了一个类似的流程,该流程UPDATE
依次使用了数十个语句,使用了UPDATE
withCASE
并且处理时间下降了约 80%。
回答by Diego
It seems logic to me that on the first option SQL Server will go through the table only once and for each row, it will evaluate the condition.
在我看来,在第一个选项中,SQL Server 只会遍历表一次,并且对于每一行,它都会评估条件,这似乎是合乎逻辑的。
On the second, it will have to go through all table 4 times
第二次,它必须遍历所有表 4 次
So, for a table with 1000 rows, on the first option on the best case scenario we are talking about 1000 evaluations and worst case, 3000. On the second we'll always have 4000 evaluations
因此,对于具有 1000 行的表,在最佳情况下的第一个选项中,我们讨论的是 1000 次评估,最坏情况下为 3000。在第二个选项中,我们将始终有 4000 次评估
So option 1 would be the faster.
所以选项1会更快。
回答by Seph
As pointed out by Mitch, try making a temp table filling it with all the data you need, make a different temp table for each column (field) you want to change. You should also add an index to the temp table(s) for added performance improvement.
正如 Mitch 所指出的,尝试制作一个临时表,用您需要的所有数据填充它,为您要更改的每一列(字段)制作一个不同的临时表。您还应该向临时表添加索引以提高性能。
This way your update statement becomes (more or less):
这样你的更新语句变成(或多或少):
UPDATE tbl_name SET field_name = COALESCE((SELECT value FROM temp_tbl WHERE tbl_name.conditional_field = temp_tbl.condition_value), field_name),
field_name2 = COALESCE((SELECT value FROM temp_tbl2 WHERE tbl_name.conditional_field2 = temp_tbl2.condition_value), field_name2)
and so on..
等等..
This should give you good performance while scaling up for large volumes of updates at once.
这应该为您提供良好的性能,同时一次扩展大量更新。