SQL 批量插入 Oracle 数据库:哪个更好:FOR Cursor 循环还是简单的 Select?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/987013/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 02:25:45  来源:igfitidea点击:

Bulk Insert into Oracle database: Which is better: FOR Cursor loop or a simple Select?

sqloracleplsql

提问by Sathyajith Bhat

Which would be a better option for bulk insert into an Oracle database ? A FOR Cursor loop like

哪个是批量插入 Oracle 数据库的更好选择?一个 FOR Cursor 循环就像

DECLARE
   CURSOR C1 IS SELECT * FROM FOO;
BEGIN
   FOR C1_REC IN C1 LOOP
   INSERT INTO BAR(A,
                B,
                C)
          VALUES(C1.A,
                 C1.B,
                 C1.C);
   END LOOP;
END

or a simple select, like:

或简单的选择,例如:

INSERT INTO BAR(A,
                B,
                C)
        (SELECT A,
                B,
                C
        FROM FOO);

Any specific reason either one would be better ?

有什么具体原因,哪一个会更好?

回答by Josh Mein

I would recommend the Select option because cursors take longer.
Also using the Select is much easier to understand for anyone who has to modify your query

我会推荐 Select 选项,因为光标需要更长的时间。
对于必须修改查询的任何人来说,使用 Select 也更容易理解

回答by Jeffrey Kemp

The general rule-of-thumb is, if you can do it using a single SQL statement instead of using PL/SQL, you should. It will usually be more efficient.

一般的经验法则是,如果您可以使用单个 SQL 语句而不是使用 PL/SQL 来完成,那么您应该这样做。它通常会更有效率。

However, if you need to add more procedural logic (for some reason), you might need to use PL/SQL, but you should use bulk operations instead of row-by-row processing. (Note: in Oracle 10g and later, your FOR loop will automatically use BULK COLLECT to fetch 100 rows at a time; however your insert statement will still be done row-by-row).

但是,如果您需要添加更多过程逻辑(出于某种原因),您可能需要使用 PL/SQL,但您应该使用批量操作而不是逐行处理。(注意:在 Oracle 10g 及更高版本中,您的 FOR 循环将自动使用 BULK COLLECT 一次获取 100 行;但是您的插入语句仍将逐行执行)。

e.g.

例如

DECLARE
   TYPE tA IS TABLE OF FOO.A%TYPE INDEX BY PLS_INTEGER;
   TYPE tB IS TABLE OF FOO.B%TYPE INDEX BY PLS_INTEGER;
   TYPE tC IS TABLE OF FOO.C%TYPE INDEX BY PLS_INTEGER;
   rA tA;
   rB tB;
   rC tC;
BEGIN
   SELECT * BULK COLLECT INTO rA, rB, rC FROM FOO;
   -- (do some procedural logic on the data?)
   FORALL i IN rA.FIRST..rA.LAST
      INSERT INTO BAR(A,
                      B,
                      C)
      VALUES(rA(i),
             rB(i),
             rC(i));
END;

The above has the benefit of minimising context switches between SQL and PL/SQL. Oracle 11g also has better support for tables of records so that you don't have to have a separate PL/SQL table for each column.

以上的好处是最大限度地减少了 SQL 和 PL/SQL 之间的上下文切换。Oracle 11g 还对记录表有更好的支持,因此您不必为每一列都有一个单独的 PL/SQL 表。

Also, if the volume of data is very great, it is possible to change the code to process the data in batches.

另外,如果数据量非常大,可以改代码来批量处理数据。

回答by MichaelN

If your rollback segment/undo segment can accomodate the size of the transaction then option 2 is better. Option 1 is useful if you do not have the rollback capacity needed and have to break the large insert into smaller commits so you don't get rollback/undo segment too small errors.

如果您的回滚段/撤消段可以容纳事务的大小,则选项 2 更好。如果您没有所需的回滚容量并且必须将大插入分解为较小的提交,则选项 1 很有用,这样您就不会出现回滚/撤消段太小的错误。

回答by Scott Swank

A simple insert/select like your 2nd option is far preferable. For each insert in the 1st option you require a context switch from pl/sql to sql. Run each with trace/tkprof and examine the results.

像您的第二个选项一样简单的插入/选择是更可取的。对于第一个选项中的每个插入,您需要从 pl/sql 到 sql 的上下文切换。使用 trace/tkprof 运行每个并检查结果。

If, as Michael mentions, your rollback cannot handle the statement then have your dba give you more. Disk is cheap, while partial results that come from inserting your data in multiple passes is potentially quite expensive. (There is almost no undo associated with an insert.)

如果,正如迈克尔所说,您的回滚无法处理该语句,则让您的 dba 给您更多。磁盘很便宜,而通过多次插入数据而产生的部分结果可能非常昂贵。(几乎没有与插入相关联的撤消。)

回答by sulica

I think that in this question is missing one important information.

我认为在这个问题中缺少一个重要信息。

How many records will you insert?

您将插入多少条记录?

  1. If from 1 to cca. 10.000 then you should use SQL statement (Like they said it is easy to understand and it is easy to write).
  2. If from cca. 10.000 to cca. 100.000 then you should use cursor, but you should add logic to commit on every 10.000 records.
  3. If from cca. 100.000 to millions then you should use bulk collect for better performance.
  1. 如果从 1 到 cca。10.000 那么你应该使用 SQL 语句(就像他们说的那样容易理解,也很容易写)。
  2. 如果来自cca。10.000 到 cca。100.000 那么你应该使用游标,但你应该添加逻辑来提交每 10.000 条记录。
  3. 如果来自cca。100.000 到数百万那么您应该使用批量收集以获得更好的性能。

回答by Arturo Hernandez

As you can see by reading the other answers, there are a lot of options available. If you are just doing < 10k rows you should go with the second option.

正如您通过阅读其他答案所看到的,有很多可用的选项。如果你只是做 < 10k 行,你应该选择第二个选项。

In short, for approx > 10k all the way to say a <100k. It is kind of a gray area. A lot of old geezers will bark at big rollback segments. But honestly hardware and software have made amazing progress to where you may be able to get away with option 2 for a lot of records if you only run the code a few times. Otherwise you should probably commit every 1k-10k or so rows. Here is a snippet that I use. I like it because it is short and I don't have to declare a cursor. Plus it has the benefits of bulk collect and forall.

简而言之,大约> 10k一直说<100k。这是一种灰色地带。许多老家伙会在大回滚段吠叫。但老实说,硬件和软件已经取得了惊人的进步,如果您只运行几次代码,您可以使用选项 2 获得大量记录。否则,您可能应该每 1k-10k 行提交一次。这是我使用的一个片段。我喜欢它,因为它很短而且我不必声明游标。此外,它还具有批量收集和 forall 的优点。

begin
    for r in (select rownum rn, t.* from foo t) loop
        insert into bar (A,B,C) values (r.A,r.B,r.C);
        if mod(rn,1000)=0 then
            commit;
        end if;
    end;
    commit;
end;

I found this linkfrom the oracle site that illustrates the options in more detail.

我从 oracle 站点找到了这个链接,它更详细地说明了这些选项。

回答by user2001117

You can use:

您可以使用:

Bulk collect along with FOR ALL that is called Bulk binding.

与称为 的 FOR ALL 一起批量收集Bulk binding

Because PL/SQL foralloperator speeds 30x faster for simple table inserts.

因为forall对于简单的表插入,PL/SQL运算符的速度快了 30 倍。

BULK_COLLECTand Oracle FORALLtogether these two features are known as Bulk Binding. Bulk Binds are a PL/SQL technique where, instead of multiple individual SELECT, INSERT, UPDATEor DELETEstatements are executed to retrieve from, or store data in, at table, all of the operations are carried out at once, in bulk. This avoids the context-switching you get when the PL/SQL engine has to pass over to the SQL engine, then back to the PL/SQL engine, and so on, when you individually access rows one at a time. To do bulk binds with INSERT, UPDATE, and DELETEstatements, you enclose the SQL statement within a PL/SQL FORALLstatement. To do bulk binds with SELECTstatements, you include the BULK COLLECTclause in the SELECTstatement instead of using INTO.

BULK_COLLECT和 OracleFORALL一起将这两个特性称为Bulk Binding. 批量绑定是一种 PL/SQL 技术,其中不是执行多个单独的SELECTINSERTUPDATEDELETE语句来从表中检索数据或将数据存储在表中,而是一次性批量执行所有操作。这避免了在 PL/SQL 引擎必须传递给 SQL 引擎,然后返回到 PL/SQL 引擎等等时,当您一次单独访问一行时发生的上下文切换。要使用INSERTUPDATEDELETE语句进行批量绑定,请将 SQL 语句括在 PL/SQLFORALL语句中。要使用SELECT语句进行批量绑定,请在BULK COLLECT语句中包含子句SELECT而不是使用INTO.

It improves performance.

它提高了性能。

回答by Hughsmg

I do neither for a daily complete reload of data. For example say I am loading my Denver site. There are other strategies for near real time deltas.

对于每天完整的数据重新加载,我都不做。例如说我正在加载我的丹佛网站。还有其他接近实时增量的策略。

I use a create table SQL as I have found is just almost as fast as a bulk load For example, below a create table statement is used to stage the data, casting the columns to the correct data type needed:

我使用创建表 SQL,因为我发现它几乎和批量加载一样快 例如,下面的创建表语句用于暂存数据,将列转换为所需的正确数据类型:

CREATE TABLE sales_dataTemp as select cast (column1 as Date) as SALES_QUARTER, cast (sales as number) as SALES_IN_MILLIONS, .... FROM TABLE1;

CREATE TABLE sales_dataTemp as select cast (column1 as Date) as SALES_QUARTER, cast (sales as number) as SALES_IN_MILLIONS, .... FROM TABLE1;

this temporary table mirrors my target table's structure exactly which is list partitioned by site. I then do a partition swap with the DENVER partition and I have a new data set.

这个临时表准确地反映了我的目标表的结构,它是按站点分区的列表。然后我与 DENVER 分区进行分区交换,我有一个新的数据集。