使用 rowids 从一个表到另一个 Oracle 更新百万行

Question

提问by Raj A

Hi I have two table with million rows in each.I have oracle 11 g R1 I am sure many of us must have gone through this situation.

嗨，我有两个表，每个表都有一百万行。我有 oracle 11 g R1 我相信我们中的许多人一定都经历过这种情况。

What is the most efficient and fast way to update from one table to another where the values are DIFFERENT.

从一个表更新到另一个值不同的最有效和最快速的方法是什么。

Eg: Table 1 has 4 NUMBER columns with a high precision eg : 0.2212454215454212

例如：表 1 有 4 个具有高精度的 NUMBER 列，例如：0.2212454215454212

Table 2 has 6 columns. update table 2's four columns based on common column on both the tables, only the different ones.

表 2 有 6 列。根据两个表上的公共列更新表 2 的四列，只有不同的列。

I have something like this

我有这样的事情

DECLARE
TYPE test1_t IS TABLE OF test.score%TYPE INDEX BY PLS_..;
TYPE test2_t IS TABLE OF test.id%TYPE INDEX BY PLS..; 
TYPE test3_t IS TABLE OF test.Crank%TYPE INDEX BY PLS..;

vscore test1_t;
vid test2_t;
vurank test4_t;

BEGIN
  SELECT id,score,urank
    BULK COLLECT INTO vid,vscore,vurank
    FROM test;

  FORALL i IN 1 .. vid.COUNT
    MERGE INTO final T
      USING (SELECT vid (i) AS o_id,
                    vurank (i) AS o_urank,
                    vscore (i) AS o_score FROM DUAL) S
      ON (S.o_id = T.id)
    WHEN MATCHED THEN
      UPDATE SET T.crank = S.o_crank
      WHERE T.crank <> S.o_crank;

Since the numbers are with high precision is it slowing down?

既然数字是高精度的，它会变慢吗？

I tried Bulk Collect and Merge combination still its taking time ~ 30 mins for worst case scenario if I have to update 1 million rows.

如果我必须更新 100 万行，我尝试了批量收集和合并组合仍然需要大约 30 分钟的最坏情况。

Is there something with rowid? Help will be appreciated.

有没有与 rowid 相关的东西？帮助将不胜感激。

Answer 1

回答by David Aldridge

If you want to update all the rows, then just use update:

如果要更新所有行，只需使用更新：

update table_1
set    (col1,
        col2) = (
         select col1,
                col2
         from   table2
         where  table2.col_a = table1.col_a and
                table2.col_b = table1.col_b)

Bulk collect or any PL/SQL technique will always be slower than a pure SQL technique.

批量收集或任何 PL/SQL 技术总是比纯 SQL 技术慢。

The numeric precision is probably not significant, and rowid is not relevant as there is no common value between the two tables.

数字精度可能不重要，而 rowid 不相关，因为两个表之间没有共同的值。

Answer 2

回答by APC

When dealing with millions of rows, parallel DML is a game changer. Of course you need to have Enterprise Edition to use parallel, but it's really the only thing which will make much difference.

在处理数百万行时，并行 DML 改变了游戏规则。当然你需要有企业版才能使用并行，但它确实是唯一会产生很大不同的东西。

I recommend you read an article on OraFAQ by rleishman comparing 8 Bulk Update Methods. His key finding is that "the cost of disk reads so far outweighs the context switches that that they are barely noticable (sic)". In other words, unless your data is already cached in memory there really isn't a significant difference between SQL and PL/SQL approaches.

我建议您阅读 rleishman 在 OraFAQ 上比较8 种批量更新方法的文章。他的主要发现是“到目前为止，磁盘读取的成本超过了几乎不可察觉的上下文切换（原文如此）”。换句话说，除非您的数据已经缓存在内存中，否则 SQL 和 PL/SQL 方法之间确实没有显着差异。

The article does have some neat suggestions on employing parallel. The surprising outcome is that a parallel pipelined function offers the best performance.

这篇文章确实有一些关于使用并行的巧妙建议。令人惊讶的结果是并行流水线函数提供了最佳性能。

Answer 3

回答by Mohsen Heydari

Focusing on the syntax have been used and skipping the logic(may using a pure update + pure insert may solve the problem, merge cost, indexes, possible full scan on merge and else )
You should use Limitin Bulk Collect syntax
Using a bulk collect with no limit

专注于已使用的语法并跳过逻辑（可能使用纯更新 + 纯插入可以解决问题、合并成本、索引、合并时可能的全扫描等）
您应该在 Bulk Collect 语法中使用Limit
Using a bulk collect没有限制

Will case all records to be loaded in memory
With no partially committed merges, you will create a larg redolog, that must be apply in the end of the process.

将 case 所有记录加载到内存中
如果没有部分提交的合并，您将创建一个大型重做日志，必须在流程结束时应用。

Both will reason in low performance.

两者都会导致性能低下。

DECLARE
 v_fetchSize NUMBER := 1000; -- based on hardware, design and .... could be scaled
 CURSOR a_cur IS 
 SELECT id,score,urank FROM test;    
 TYPE myarray IS TABLE OF a_cur%ROWTYPE;
 cur_array myarray;

    BEGIN
      OPEN a_cur;
      LOOP
        FETCH a_cur BULK COLLECT INTO cur_array LIMIT v_fetchSize;
          FORALL i IN 1 .. cur_array.COUNT
          // DO Operation
          COMMIT;
        EXIT WHEN a_cur%NOTFOUND;
      END LOOP;
      CLOSE a_cur;
    END;

Answer 4

回答by ThinkJet

Just to be sure: test.idand final.idmust be indexed.
With first select ... from testyou got too much records from Table 1and after that you need to compare all of them with records on Table 2. Try to select only what you need to update. So, there are at least 2 variants:

可以肯定的是：test.id并且final.id必须编入索引。
首先，select ... from test您从中获得了太多记录，Table 1然后您需要将所有记录与上的记录进行比较Table 2。尝试仅选择您需要更新的内容。因此，至少有两种变体：

a) select only changed records:

a) 只选择改变的记录：

  SELECT source_table.id, source_table.score, source_table.urank 
  BULK COLLECT INTO vid,vscore,vurank 
  FROM 
    test source_table, 
    final destination_table
  where 
    source_table.id = destination_table.id 
    and
    source_table.crank <> destination_table.crank
   ;

b) Add new field to source table with datetime value and fill it in trigger with current time. While synchronizing pick only records changed during last day. This field needs to be indexed.

b) 使用日期时间值将新字段添加到源表，并使用当前时间将其填充到触发器中。同步选择时仅记录在最后一天更改。这个字段需要被索引。

After such a change on update phase you don't need to compare other fields, only match ID's:

在更新阶段进行此类更改后，您无需比较其他字段，只需匹配 ID：

  FORALL i IN 1 .. vid.COUNT 
  MERGE INTO FINAL T 
  USING (
    SELECT vid (i) AS o_id,
           vurank (i) AS o_urank,
           vscore (i) AS o_score FROM DUAL
  ) S 
  ON (S.o_id = T.id) 
  WHEN MATCHED 
  THEN UPDATE SET T.crank = S.o_crank

If you worry about size of undo/redo segments then variant b)is more useful, because you can get records from source Table 1divided to time slices and commit changes after updating every slice. E.g. from 00:00 to 01:00 , from 01:00 to 02:00 etc. In this variant update can be done just by SQL statement without selecting a data into collections in row with maintaining acceptable sizes of redo/undo logs.

如果您担心撤消/重做段的大小，那么变体b)更有用，因为您可以将源中的记录Table 1划分为时间片并在更新每个片后提交更改。例如，从 00:00 到 01:00 ，从 01:00 到 02:00 等。在这个变体中，更新可以仅通过 SQL 语句完成，而无需将数据选择到行中的集合中，并保持重做/撤消日志的可接受大小。

使用 rowids 从一个表到另一个 Oracle 更新百万行

提问by Raj A

回答by David Aldridge

回答by APC

回答by Mohsen Heydari

回答by ThinkJet

相关推荐

最近更新

标签

使用 rowids 从一个表到另一个 Oracle 更新百万行

提问by Raj A

回答by David Aldridge

回答by APC

回答by Mohsen Heydari

回答by ThinkJet

相关推荐

Oracle CAST 到 NVARCHAR2 上的奇怪行为

在 Oracle SQL 中选择 xpath 值作为单独的行

在 Oracle SQL Developer 3.2 中使用变量

Oracle APEX - 显示/隐藏区域

相关推荐

最近更新

标签