Oracle SQL 通过使用游标和单次提交高效插入大数据集
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11764873/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Oracle SQL insert large data set efficiently via cursor use and single commit
提问by Woot4Moo
The following SQL generates all matching records between two tables that have identical schemas and then proceeds to iterate over the cursor that stores this result set. I do a row by row insert with a commit at the end of this function. My question is how can I get the maximum performance from this type of query? Code follows:
以下 SQL 在具有相同模式的两个表之间生成所有匹配记录,然后继续迭代存储此结果集的游标。我逐行插入并在此函数结束时提交。我的问题是如何从此类查询中获得最大性能?代码如下:
BEGIN
DECLARE
CURSOR foo IS
SELECT * FROM tableOne to
WHERE EXISTS (SELECT * FROM tableTwo tt
WHERE TO.FOO = TT.FOO
AND TO.BAR = TT.BAR); --THIS TAKES 5 MINUTES (66 MILLION ROWS)
BEGIN
FOR nextFoo IN foo
LOOP
INSERT INTO tracker t
(id,foo,bar,baz)
VALUES(trackerSequence.nextval, nextFoo.foo,nextFoo.bar,nextFoo.baz);
END LOOP;
COMMIT;
END;
END;
This query can take upwards of an hour and I am trying to reduce the time cost associated with it. I will be processing 140 million records in general so I am expecting to double the amount of time this process takes. All columns are indexed.
此查询可能需要一个小时以上的时间,我正在尝试减少与之相关的时间成本。我将处理 1.4 亿条记录,因此我预计此过程所需的时间会增加一倍。所有列都被索引。
Version information:
版本信息:
10g 10.2
10g 10.2
采纳答案by Woot4Moo
I have found the following will do 130 million inserts in about 49 minutes.
我发现以下内容将在大约 49 分钟内完成 1.3 亿次插入。
INSERT INTO tracker t
SELECT * FROM tableOne to
WHERE NOT EXISTS (SELECT * FROM tableTwo tt
WHERE TO.FOO = TT.FOO
AND TO.BAR = TT.BAR);
回答by Markus Mikkolainen
how about
怎么样
INSERT INTO tracker t SELECT trackerSequence.nextVal
,foo
,bar
,baz
FROM tableOne to
INNER JOIN tabletwo tt
ON (to.foo = tt.foo and to.bar=tt.bar);
I wonder if that would be optimized better.
我想知道这是否会更好地优化。
Also make sure the tracker -table indexes are disabled while inserting..
还要确保在插入时禁用跟踪器-表索引..
回答by Mike
OK, I know you wanted the cursor...
好的,我知道你想要光标...
The only real advantage to using the cursor is to commit every 10k? rows when processing that much data to avoid filling the logs up.
使用游标的唯一真正优势是每 10k 提交一次?行处理那么多数据以避免填满日志。
Unless you really need the cursor, Eliminate the row processing.
除非你真的需要游标,否则取消行处理。
insert into tracker (id, foo, bar, baz)
select trackersequence.nextval, t1.foo, t1.bar, t2.baz
from tableone t1, tabletwo t2 where
t1.foo = t2.foo and
t1.bar = t2.bar;
Direct Path Insert hint as suggested
建议的直接路径插入提示
insert /*+ append */ into tracker (id, foo, bar, baz)
select trackersequence.nextval, t1.foo, t1.bar, t2.baz
from tableone t1, tabletwo t2 where
t1.foo = t2.foo and
t1.bar = t2.bar;
回答by Gaurav Soni
DECLARE
CURSOR foo_cur
IS SELECT * FROM tableOne TO
WHERE EXISTS (SELECT * FROM tableTwo tt
WHERE TO.FOO = TT.FOO
AND TO.BAR = TT.BAR); --THIS TAKES 5 MINUTES (66 MILLION ROWS)
TYPE foo_nt IS TABLE OF tableOne%ROWTYPE;
v_foo_nt foo_nt;
BEGIN
OPEN foo_cur ;
LOOP
FETCH foo_cur BULK COLLECT INTO v_foo_nt LIMIT 1000;
FORALL i IN v_foo_nt.FIRST..v_foo_nt.LAST
INSERT INTO tracker t
(id,foo,bar,baz)
VALUES(trackerSequence.nextval, v_foo_nt(i).foo,v_foo_nt(i).bar,v_foo_nt(i).baz);
EXIT WHEN foo_cur%NOTFOUND;
END LOOP;
CLOSE foo_cur;
COMMIT;
END;
END;
回答by YePhIcK
First - how can you optimize your PL/SQL's performance:
首先 - 如何优化 PL/SQL 的性能:
- Disable indexes and any other constraints on target tables before you begin your load and re-enable them after you are done
- Don't commit at the very end - have commit points to free-up rollback segments
- 在开始加载之前禁用目标表上的索引和任何其他约束,并在完成后重新启用它们
- 不要在最后提交 - 拥有释放回滚段的提交点
Second - don't do the insert with PL/SQL. Use BulkLoading (as already suggested by some comments). you can easily find lots of info on BulkLoading if you Google for "oracle sql loader"
第二 - 不要用 PL/SQL 进行插入。使用 BulkLoading(正如一些评论所建议的那样)。如果您在 Google 上搜索“oracle sql loader”,则可以轻松找到有关 BulkLoading 的大量信息
回答by AnBisw
I have almost always received better performance with such bulk data inserts by using a combination of BITMAP INDEXES
and using a DPL (Direct Path Load) i.e. with the use of hint /*+ APPEND+/
.
通过结合BITMAP INDEXES
使用和使用 DPL(直接路径加载),即使用 hint ,我几乎总是通过这种批量数据插入获得更好的性能/*+ APPEND+/
。
I would also assume that with this you would have proper indexes on both TT.FOO, TT.BAR
and TO.FOO, TO.BAR
.
So somethink like
我还假设,有了这个,您将在TT.FOO, TT.BAR
和上都有适当的索引TO.FOO, TO.BAR
。所以有人认为
INSERT /*+ APPEND*/
INTO TRACKER T
SELECT trackerSequence.nextval, to.foo,to.bar,to.baz
FROM tableOne to
WHERE EXISTS (SELECT 'x'
FROM tableTwo tt
WHERE TO.FOO = TT.FOO
AND TO.BAR = TT.BAR);
Also- Keep in mind that the EXIST
clause might bite you back under certain situations. So, you may want to use simple outer joins.
另外 - 请记住,该EXIST
条款可能会在某些情况下咬你。因此,您可能希望使用简单的外连接。
INSERT /*+ APPEND*/
INTO TRACKER T
SELECT DISTINCT trackerSequence.nextval, to.foo,to.bar,to.baz
FROM tableOne to , tableTwo tt
WHERE TO.FOO = TT.FOO
AND TO.BAR = TT.BAR;
Remember - DPL (Direct path load) will not always improve the performance of your query, it may improve (or help) if your table is properly partitioned.
请记住 - DPL(直接路径加载)不会总是提高您的查询性能,如果您的表被正确分区,它可能会提高(或帮助)。
Try Explain plan on these queries to find out the best.
Also, (as one of the answer already mentions) do not commit at the end, but do not commit on every record either. It would be suggested to use a custom commit point something similar to while using LIMIT XXXX
while BULK COLLECT
ing. Your commit points will govern how large your ROLLBAK
segments are. You can also use your custom Commit points (as simple as as counter) procedurally (i.e. in a PLSQL BLOCK).
尝试对这些查询进行解释计划以找出最佳方案。此外,(正如已经提到的答案之一)不要在最后提交,但也不要在每条记录上提交。建议使用类似于LIMIT XXXX
while BULK COLLECT
ing的自定义提交点。您的提交点将决定您的ROLLBAK
段有多大。您还可以在程序上(即在 PLSQL BLOCK 中)使用您的自定义提交点(就像计数器一样简单)。
Query performance also depends on the HWM
of your table (to be specific), you would almost always want to perform under the HWM
of the table. Whilst TRUNCATE
on the TRACKER
table will help achieve this, previous data on it will be lost, so this could hardly be a solution here. Follow this AskTomlink to learn how to find HWM
.
查询性能还取决于HWM
您的表(具体而言),您几乎总是希望HWM
在表下执行。虽然TRUNCATE
在TRACKER
桌子上将有助于实现这一点,但以前的数据将丢失,因此这在这里很难成为解决方案。按照此AskTom链接了解如何查找HWM
.