Oracle SQL 通过使用游标和单次提交高效插入大数据集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11764873/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 01:00:41  来源:igfitidea点击:

Oracle SQL insert large data set efficiently via cursor use and single commit

sqldatabaseperformanceoracleoracle10g

提问by Woot4Moo

The following SQL generates all matching records between two tables that have identical schemas and then proceeds to iterate over the cursor that stores this result set. I do a row by row insert with a commit at the end of this function. My question is how can I get the maximum performance from this type of query? Code follows:

以下 SQL 在具有相同模式的两个表之间生成所有匹配记录,然后继续迭代存储此结果集的游标。我逐行插入并在此函数结束时提交。我的问题是如何从此类查询中获得最大性能?代码如下:

BEGIN  
DECLARE    
   CURSOR foo IS  
        SELECT * FROM tableOne to  
        WHERE EXISTS (SELECT * FROM tableTwo tt  
                       WHERE TO.FOO = TT.FOO  
                       AND TO.BAR = TT.BAR);  --THIS TAKES 5 MINUTES (66 MILLION ROWS)
     BEGIN  
           FOR nextFoo IN foo  
     LOOP  
            INSERT INTO tracker t  
               (id,foo,bar,baz)  
            VALUES(trackerSequence.nextval, nextFoo.foo,nextFoo.bar,nextFoo.baz);  
     END LOOP;   
     COMMIT;  
     END;  
END;

This query can take upwards of an hour and I am trying to reduce the time cost associated with it. I will be processing 140 million records in general so I am expecting to double the amount of time this process takes. All columns are indexed.

此查询可能需要一个小时以上的时间,我正在尝试减少与之相关的时间成本。我将处理 1.4 亿条记录,因此我预计此过程所需的时间会增加一倍。所有列都被索引。

Version information:

版本信息:

10g 10.2

10g 10.2

采纳答案by Woot4Moo

I have found the following will do 130 million inserts in about 49 minutes.

我发现以下内容将在大约 49 分钟内完成 1.3 亿次插入。

INSERT INTO tracker t  
    SELECT * FROM tableOne to  
        WHERE NOT EXISTS (SELECT * FROM tableTwo tt  
                       WHERE TO.FOO = TT.FOO  
                       AND TO.BAR = TT.BAR);

回答by Markus Mikkolainen

how about

怎么样

INSERT INTO tracker t SELECT trackerSequence.nextVal
                            ,foo
                            ,bar
                            ,baz 
                      FROM tableOne to 
                            INNER JOIN tabletwo tt 
                         ON (to.foo = tt.foo and to.bar=tt.bar);

I wonder if that would be optimized better.

我想知道这是否会更好地优化。

Also make sure the tracker -table indexes are disabled while inserting..

还要确保在插入时禁用跟踪器-表索引..

回答by Mike

OK, I know you wanted the cursor...

好的,我知道你想要光标...

The only real advantage to using the cursor is to commit every 10k? rows when processing that much data to avoid filling the logs up.

使用游标的唯一真正优势是每 10k 提交一次?行处理那么多数据以避免填满日志。

Unless you really need the cursor, Eliminate the row processing.

除非你真的需要游标,否则取消行处理。

insert into tracker (id, foo, bar, baz)
select trackersequence.nextval, t1.foo, t1.bar, t2.baz
from tableone t1, tabletwo t2 where 
t1.foo = t2.foo and
t1.bar = t2.bar;

Direct Path Insert hint as suggested

建议的直接路径插入提示

insert /*+ append */ into tracker (id, foo, bar, baz)
select trackersequence.nextval, t1.foo, t1.bar, t2.baz
from tableone t1, tabletwo t2 where 
t1.foo = t2.foo and
t1.bar = t2.bar;

回答by Gaurav Soni

DECLARE    

CURSOR foo_cur 
IS SELECT * FROM tableOne TO  
    WHERE EXISTS (SELECT * FROM tableTwo tt  
                   WHERE TO.FOO = TT.FOO  
                   AND TO.BAR = TT.BAR);  --THIS TAKES 5 MINUTES (66 MILLION ROWS)  

TYPE foo_nt IS TABLE OF tableOne%ROWTYPE;
v_foo_nt foo_nt;

 BEGIN

  OPEN foo_cur ;
  LOOP
  FETCH foo_cur BULK COLLECT INTO v_foo_nt LIMIT 1000;

       FORALL  i IN v_foo_nt.FIRST..v_foo_nt.LAST  
         INSERT INTO tracker t  
           (id,foo,bar,baz)  
          VALUES(trackerSequence.nextval, v_foo_nt(i).foo,v_foo_nt(i).bar,v_foo_nt(i).baz);  

EXIT WHEN foo_cur%NOTFOUND;  
END LOOP; 
  CLOSE foo_cur;  
  COMMIT;  
 END;  
END;

回答by YePhIcK

First - how can you optimize your PL/SQL's performance:

首先 - 如何优化 PL/SQL 的性能:

  • Disable indexes and any other constraints on target tables before you begin your load and re-enable them after you are done
  • Don't commit at the very end - have commit points to free-up rollback segments
  • 在开始加载之前禁用目标表上的索引和任何其他约束,并在完成后重新启用它们
  • 不要在最后提交 - 拥有释放回滚段的提交点

Second - don't do the insert with PL/SQL. Use BulkLoading (as already suggested by some comments). you can easily find lots of info on BulkLoading if you Google for "oracle sql loader"

第二 - 不要用 PL/SQL 进行插入。使用 BulkLoading(正如一些评论所建议的那样)。如果您在 Google 上搜索“oracle sql loader”,则可以轻松找到有关 BulkLoading 的大量信息

回答by AnBisw

I have almost always received better performance with such bulk data inserts by using a combination of BITMAP INDEXESand using a DPL (Direct Path Load) i.e. with the use of hint /*+ APPEND+/.

通过结合BITMAP INDEXES使用和使用 DPL(直接路径加载),即使用 hint ,我几乎总是通过这种批量数据插入获得更好的性能/*+ APPEND+/

I would also assume that with this you would have proper indexes on both TT.FOO, TT.BARand TO.FOO, TO.BAR. So somethink like

我还假设,有了这个,您将在TT.FOO, TT.BAR和上都有适当的索引TO.FOO, TO.BAR。所以有人认为

INSERT /*+ APPEND*/
  INTO TRACKER T
SELECT trackerSequence.nextval, to.foo,to.bar,to.baz
  FROM tableOne to  
  WHERE EXISTS (SELECT 'x' 
                  FROM tableTwo tt  
                 WHERE TO.FOO = TT.FOO  
                   AND TO.BAR = TT.BAR);

Also- Keep in mind that the EXISTclause might bite you back under certain situations. So, you may want to use simple outer joins.

另外 - 请记住,该EXIST条款可能会在某些情况下咬你。因此,您可能希望使用简单的外连接。

INSERT /*+ APPEND*/
  INTO TRACKER T
SELECT DISTINCT trackerSequence.nextval, to.foo,to.bar,to.baz
  FROM tableOne to , tableTwo tt
  WHERE TO.FOO = TT.FOO  
    AND TO.BAR = TT.BAR;

Remember - DPL (Direct path load) will not always improve the performance of your query, it may improve (or help) if your table is properly partitioned.

请记住 - DPL(直接路径加载)不会总是提高您的查询性能,如果您的表被正确分区,它可能会提高(或帮助)。

Try Explain plan on these queries to find out the best. Also, (as one of the answer already mentions) do not commit at the end, but do not commit on every record either. It would be suggested to use a custom commit point something similar to while using LIMIT XXXXwhile BULK COLLECTing. Your commit points will govern how large your ROLLBAKsegments are. You can also use your custom Commit points (as simple as as counter) procedurally (i.e. in a PLSQL BLOCK).

尝试对这些查询进行解释计划以找出最佳方案。此外,(正如已经提到的答案之一)不要在最后提交,但也不要在每条记录上提交。建议使用类似于LIMIT XXXXwhile BULK COLLECTing的自定义提交点。您的提交点将决定您的ROLLBAK段有多大。您还可以在程序上(即在 PLSQL BLOCK 中)使用您的自定义提交点(就像计数器一样简单)。

Query performance also depends on the HWMof your table (to be specific), you would almost always want to perform under the HWMof the table. Whilst TRUNCATEon the TRACKERtable will help achieve this, previous data on it will be lost, so this could hardly be a solution here. Follow this AskTomlink to learn how to find HWM.

查询性能还取决于HWM您的表(具体而言),您几乎总是希望HWM在表下执行。虽然TRUNCATETRACKER桌子上将有助于实现这一点,但以前的数据将丢失,因此这在这里很难成为解决方案。按照此AskTom链接了解如何查找HWM.