使用 rowids 从一个表到另一个 Oracle 更新百万行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17156137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Update million rows using rowids from one table to another Oracle
提问by Raj A
Hi I have two table with million rows in each.I have oracle 11 g R1 I am sure many of us must have gone through this situation.
嗨,我有两个表,每个表都有一百万行。我有 oracle 11 g R1 我相信我们中的许多人一定都经历过这种情况。
What is the most efficient and fast way to update from one table to another where the values are DIFFERENT.
从一个表更新到另一个值不同的最有效和最快速的方法是什么。
Eg: Table 1 has 4 NUMBER columns with a high precision eg : 0.2212454215454212
例如:表 1 有 4 个具有高精度的 NUMBER 列,例如:0.2212454215454212
Table 2 has 6 columns. update table 2's four columns based on common column on both the tables, only the different ones.
表 2 有 6 列。根据两个表上的公共列更新表 2 的四列,只有不同的列。
I have something like this
我有这样的事情
DECLARE
TYPE test1_t IS TABLE OF test.score%TYPE INDEX BY PLS_..;
TYPE test2_t IS TABLE OF test.id%TYPE INDEX BY PLS..;
TYPE test3_t IS TABLE OF test.Crank%TYPE INDEX BY PLS..;
vscore test1_t;
vid test2_t;
vurank test4_t;
BEGIN
SELECT id,score,urank
BULK COLLECT INTO vid,vscore,vurank
FROM test;
FORALL i IN 1 .. vid.COUNT
MERGE INTO final T
USING (SELECT vid (i) AS o_id,
vurank (i) AS o_urank,
vscore (i) AS o_score FROM DUAL) S
ON (S.o_id = T.id)
WHEN MATCHED THEN
UPDATE SET T.crank = S.o_crank
WHERE T.crank <> S.o_crank;
Since the numbers are with high precision is it slowing down?
既然数字是高精度的,它会变慢吗?
I tried Bulk Collect and Merge combination still its taking time ~ 30 mins for worst case scenario if I have to update 1 million rows.
如果我必须更新 100 万行,我尝试了批量收集和合并组合仍然需要大约 30 分钟的最坏情况。
Is there something with rowid? Help will be appreciated.
有没有与 rowid 相关的东西?帮助将不胜感激。
回答by David Aldridge
If you want to update all the rows, then just use update:
如果要更新所有行,只需使用更新:
update table_1
set (col1,
col2) = (
select col1,
col2
from table2
where table2.col_a = table1.col_a and
table2.col_b = table1.col_b)
Bulk collect or any PL/SQL technique will always be slower than a pure SQL technique.
批量收集或任何 PL/SQL 技术总是比纯 SQL 技术慢。
The numeric precision is probably not significant, and rowid is not relevant as there is no common value between the two tables.
数字精度可能不重要,而 rowid 不相关,因为两个表之间没有共同的值。
回答by APC
When dealing with millions of rows, parallel DML is a game changer. Of course you need to have Enterprise Edition to use parallel, but it's really the only thing which will make much difference.
在处理数百万行时,并行 DML 改变了游戏规则。当然你需要有企业版才能使用并行,但它确实是唯一会产生很大不同的东西。
I recommend you read an article on OraFAQ by rleishman comparing 8 Bulk Update Methods. His key finding is that "the cost of disk reads so far outweighs the context switches that that they are barely noticable (sic)". In other words, unless your data is already cached in memory there really isn't a significant difference between SQL and PL/SQL approaches.
我建议您阅读 rleishman 在 OraFAQ 上比较8 种批量更新方法的文章。他的主要发现是“到目前为止,磁盘读取的成本超过了几乎不可察觉的上下文切换(原文如此)”。换句话说,除非您的数据已经缓存在内存中,否则 SQL 和 PL/SQL 方法之间确实没有显着差异。
The article does have some neat suggestions on employing parallel. The surprising outcome is that a parallel pipelined function offers the best performance.
这篇文章确实有一些关于使用并行的巧妙建议。令人惊讶的结果是并行流水线函数提供了最佳性能。
回答by Mohsen Heydari
Focusing on the syntax have been used and skipping the logic(may using a pure update + pure insert may solve the problem, merge cost, indexes, possible full scan on merge and else )
You should use Limitin Bulk Collect syntax
Using a bulk collect with no limit
专注于已使用的语法并跳过逻辑(可能使用纯更新 + 纯插入可以解决问题、合并成本、索引、合并时可能的全扫描等)
您应该在 Bulk Collect 语法中
使用Limit
Using a bulk collect没有限制
- Will case all records to be loaded in memory
- With no partially committed merges, you will create a larg redolog, that must be apply in the end of the process.
- 将 case 所有记录加载到内存中
- 如果没有部分提交的合并,您将创建一个大型重做日志,必须在流程结束时应用。
Both will reason in low performance.
两者都会导致性能低下。
DECLARE
v_fetchSize NUMBER := 1000; -- based on hardware, design and .... could be scaled
CURSOR a_cur IS
SELECT id,score,urank FROM test;
TYPE myarray IS TABLE OF a_cur%ROWTYPE;
cur_array myarray;
BEGIN
OPEN a_cur;
LOOP
FETCH a_cur BULK COLLECT INTO cur_array LIMIT v_fetchSize;
FORALL i IN 1 .. cur_array.COUNT
// DO Operation
COMMIT;
EXIT WHEN a_cur%NOTFOUND;
END LOOP;
CLOSE a_cur;
END;
回答by ThinkJet
Just to be sure:
test.id
andfinal.id
must be indexed.With first
select ... from test
you got too much records fromTable 1
and after that you need to compare all of them with records onTable 2
. Try to select only what you need to update. So, there are at least 2 variants:
可以肯定的是:
test.id
并且final.id
必须编入索引。首先,
select ... from test
您从中获得了太多记录,Table 1
然后您需要将所有记录与 上的记录进行比较Table 2
。尝试仅选择您需要更新的内容。因此,至少有两种变体:
a) select only changed records:
a) 只选择改变的记录:
SELECT source_table.id, source_table.score, source_table.urank
BULK COLLECT INTO vid,vscore,vurank
FROM
test source_table,
final destination_table
where
source_table.id = destination_table.id
and
source_table.crank <> destination_table.crank
;
b) Add new field to source table with datetime value and fill it in trigger with current time. While synchronizing pick only records changed during last day. This field needs to be indexed.
b) 使用日期时间值将新字段添加到源表,并使用当前时间将其填充到触发器中。同步选择时仅记录在最后一天更改。这个字段需要被索引。
After such a change on update phase you don't need to compare other fields, only match ID's:
在更新阶段进行此类更改后,您无需比较其他字段,只需匹配 ID:
FORALL i IN 1 .. vid.COUNT
MERGE INTO FINAL T
USING (
SELECT vid (i) AS o_id,
vurank (i) AS o_urank,
vscore (i) AS o_score FROM DUAL
) S
ON (S.o_id = T.id)
WHEN MATCHED
THEN UPDATE SET T.crank = S.o_crank
If you worry about size of undo/redo segments then variant b)
is more useful, because you can get records from source Table 1
divided to time slices and commit changes after updating every slice. E.g. from 00:00 to 01:00 , from 01:00 to 02:00 etc.
In this variant update can be done just by SQL statement without selecting a data into collections in row with maintaining acceptable sizes of redo/undo logs.
如果您担心撤消/重做段的大小,那么变体b)
更有用,因为您可以将源中的记录Table 1
划分为时间片并在更新每个片后提交更改。例如,从 00:00 到 01:00 ,从 01:00 到 02:00 等。在这个变体中,更新可以仅通过 SQL 语句完成,而无需将数据选择到行中的集合中,并保持重做/撤消日志的可接受大小。