SQL 在 Oracle 中删除大量数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/644975/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 01:25:04  来源:igfitidea点击:

Deleting a LOT of data in Oracle

sqloracleplsql

提问by Sarah Mei

I am not a database person, exactly, and most of my db work has been with MySQL, so forgive me if something in this question is incredibly na?ve.

确切地说,我不是数据库人员,而且我的大部分数据库工作都与 MySQL 相关,所以如果这个问题中的某些内容非常幼稚,请原谅我。

I need to delete 5.5 million rows from an Oracle table that has about 100 million rows. I have all the IDs of the rows I need to delete in a temporary table. If it were a just a few thousand rows, I'd do this:

我需要从大约有 1 亿行的 Oracle 表中删除 550 万行。我在临时表中拥有需要删除的行的所有 ID。如果只有几千行,我会这样做:

DELETE FROM table_name WHERE id IN (SELECT id FROM temp_table);
COMMIT;

Is there anything I need to be aware of, and/or do differently, because it's 5.5 million rows? I thought about doing a loop, something like this:

有什么我需要注意的,和/或做不同的,因为它有 550 万行?我想过做一个循环,像这样:

DECLARE
  vCT NUMBER(38) := 0;

BEGIN
  FOR t IN (SELECT id FROM temp_table) LOOP
    DELETE FROM table_name WHERE id = t.id;
    vCT := vCT + 1;
    IF MOD(vCT,200000) = 0 THEN
      COMMIT;
    END IF;
  END LOOP;
  COMMIT;
END;

First of all - is this doing what I think it is - batching commits of 200,000 at a time? Assuming it is, I'm still not sure if it's better to generate 5.5 million SQL statements, and commit in batches of 200,000, or to have one SQL statement and commit all at once.

首先 - 这是在做我认为的事情吗 - 一次批处理 200,000 次提交?假设是这样,我仍然不确定是生成 550 万条 SQL 语句,然后分批提交 200,000 条,还是一条 SQL 语句并一次提交全部更好。

Ideas? Best practices?

想法?最佳实践?

EDIT: I ran the first option, the single delete statement, and it only took 2 hours to complete in development. Based on that, it's queued to be run in production.

编辑:我运行了第一个选项,单个删除语句,在开发过程中只用了 2 个小时就完成了。基于此,它排队等待在生产中运行。

回答by Jiri Klouda

The first approach is better, because you give the query optimizer a clear picture of what you are trying to do, instead of trying to hide it. The database engine might take a different approach to deleting 5.5m (or 5.5% of the table) internally than to deleting 200k (or 0.2%).

第一种方法更好,因为您可以让查询优化器清楚地了解您要做什么,而不是试图隐藏它。数据库引擎在内部删除 5.5m(或表的 5.5%)可能会采用与删除 200k(或 0.2%)不同的方法。

Here is also an articleabout massive DELETE in Oracle which you might want to read.

这里还有一篇关于 Oracle 中大量 DELETE的文章,您可能想阅读。

回答by FerranB

The fastest way is to create a new one with CREATE TABLE AS SELECTusing NOLOGGINGoption. I mean:

最快的方法是CREATE TABLE AS SELECT使用 usingNOLOGGING选项创建一个新的。我的意思是:

ALTER TABLE table_to_delete RENAME TO tmp;
CREATE TABLE table_to_delete NOLOGGING AS SELECT .... ;

Of course you have to recreate constraints with no validate, indexes with nologging, grants, ... but is very very fast.

当然,您必须重新创建没有验证的约束、没有日志记录的索引、授权……但速度非常快。

If you have the trouble in production, you can do the following:

如果您在生产中遇到问题,您可以执行以下操作:

ALTER TABLE table_to_delete RENAME to tmp;
CREATE VIEW table_to_delete AS SELECT * FROM tmp;
-- Until there can be instantly
CREATE TABLE new_table NOLOGGING AS SELECT .... FROM tmp WHERE ...;
<create indexes with nologging>
<create constraints with novalidate>
<create other things...>
-- From here ...
DROP VIEW table_to_delete;
ALTER TABLE new_table RENAME TO table_to_delete;
-- To here, also instantly

You have take care of:

你已经照顾:

  • Stored procedures can be invalidated, but they will be recompiled the second time are called. You have to test it.
  • NOLOGGINGmeans that minimalredo are generated. If you have DBA role, run a ALTER SYSTEM CHECKPOINTto ensure no data lost if instance crash.
  • For NOLOGGINGthe tablespace have to be also in NOLOGGING.
  • 存储过程可以失效,但它们将在第二次调用时重新编译。你必须测试它。
  • NOLOGGING意味着生成最少的重做。如果您有 DBA 角色,请运行 aALTER SYSTEM CHECKPOINT以确保在实例崩溃时不会丢失数据。
  • 因为NOLOGGING表空间也必须在NOLOGGING.

Another option better than create milions of inserts is:

另一个比创建数百万个插入更好的选择是:

-- Create table with ids
DELETE FROM table_to_delete
 WHERE ID in (SELECT ID FROM table_with_ids WHERE ROWNUM < 100000);
DELETE FROM table_with_ids WHERE ROWNUM < 100000;
COMMIT;
-- Run this 50 times ;-)

The PLSQL choice is not advisable because can create the Snapshot too oldmessage due that you are commiting (and closing the transaction) with an opened cursor (the looped one) you want to continue using it. Oracle allows it but it's not a good practice.

PLSQL 的选择是不可取的,因为可能会创建Snapshot too old消息,因为您正在使用打开的游标(循环游标)提交(并关闭事务)并希望继续使用它。Oracle 允许这样做,但这不是一个好习惯。

UPDATE: Why I can ensure the last PLSQL block is going to work? Because I supose that:

更新:为什么我可以确保最后一个 PLSQL 块会起作用?因为我认为:

  • No other one is using this temporary table for any reason (dba or jobs gathering statistics, dab tasks like move, inserting records, and so on). That can be ensured because is an auxiliar table only for this.
  • Then, with the last assertion, the query is going to be executed exactlywith the same plan and is going to return the rows with the same order.
  • 没有其他人出于任何原因使用这个临时表(dba 或作业收集统计数据,dab 任务如移动、插入记录等)。这是可以确保的,因为这是一个仅用于此的辅助表。
  • 然后,使用最后一个断言,查询将完全按照相同的计划执行,并以相同的顺序返回行。

回答by Quassnoi

When performing massive deletions in Oracle, make sure you are not running out of UNDO SEGMENTS.

在 中执行大量删除时Oracle,请确保您没有用完UNDO SEGMENTS.

When performing DML, Oraclefirst writes all changes into the REDOlog (the old data along with the new data).

执行时DMLOracle首先将所有更改写入REDO日志(旧数据和新数据)。

When the REDOlog is filled or a timeout occurs, Oracleperforms log synchronization: it writes newdata into the datafiles (in your case, marks the datafile blocks as free), and writes old data into the UNDOtablespace (so that it remains visible to the concurrent transactions until you commityour changes).

REDO日志已满或发生超时时,Oracle执行log synchronization:将new数据写入数据文件(在您的情况下,将数据文件块标记为空闲),并将旧数据写入UNDO表空间(以便它对并发事务保持可见,直到您commit您的更改)。

When you commit your changes, the space in UNDOsegments occupied by yuor transaction is freed.

当您提交更改时,您的UNDO事务占用的段中的空间将被释放。

This means that if you delete 5Mrows of data, you'll need to have space for allthese rows in your UNDOsegments so that the data can be moved there first (all at once) and deleted only after commit.

这意味着,如果删除5M数据行,则需要allUNDO段中为这些行留出空间,以便数据可以先移动到那里 ( all at once) 并仅在提交后删除。

This also means that the concurrent queries (if any) will need to read from REDOlogs or UNDOsegments when performing table scans. This is not the fastest way to access data.

这也意味着并发查询(如果有)在执行表扫描时需要从REDO日志或UNDO段中读取。这不是访问数据的最快方式。

This also means that if the optimizer will select HASH JOINfor your deletion query (which it will most probably do), and the temp table will not fit into the HASH_AREA_SIZE(which most probably will be the case), then the query will need severalscans over the big table, and some of the parts of the table will be already moved into REDOor UNDO.

这也意味着,如果优化器将为HASH JOIN您的删除查询选择(它很可能会这样做),并且临时表不适合HASH_AREA_SIZE(最有可能是这种情况),那么查询将需要several扫描大table,并且 table 的某些部分将已经移动到REDOor 中UNDO

Given all said above, you'd probably better delete data in 200,000chunks and commit the changes in between.

鉴于上述所有内容,您可能最好分200,000块删除数据并提交其间的更改。

Thus you will, first, get rid of the problems described above, and, second, optimize your HASH_JOIN, as you will have the same number of reads but the reads themselves will be more efficient.

因此,您将首先摆脱上述问题,然后优化您的HASH_JOIN,因为您将拥有相同数量的读取,但读取本身将更有效率。

In your case, though, I would try to force the optimizer to use NESTED LOOPS, as I expect it will be faster in your case.

但是,在您的情况下,我会尝试强制优化器使用NESTED LOOPS,因为我希望在您的情况下它会更快。

To do this, make sure your temp table has a primary key on ID, and rewrite your query as following:

为此,请确保您的临时表在 上有一个主键ID,并按如下方式重写您的查询:

DELETE  
FROM   (
       SELECT  /*+ USE_NL(tt, tn) */
               tn.id
       FROM    temp_table tt, table_name tn
       WHERE   tn.id = tt.id
       )

You'll need to have the primary key on temp_tablefor this query to work.

您需要打开主键temp_table才能使此查询工作。

Compare it with the following:

将其与以下内容进行比较:

DELETE  
FROM   (
       SELECT  /*+ USE_HASH(tn tt) */
               tn.id
       FROM    temp_table tt, table_name tn
       WHERE   tn.id = tt.id
       )

, see what is faster and stick to this.

,看看什么更快并坚持下去。

回答by Jon Ericson

It's better to do everything at once as in your first example. But I'd definitely go over it with your DBA first since they may want to reclaim the blocks you are no longer using after the purge. Also, there may be scheduling concerns that are not normally visible from the user perspective.

最好像第一个示例一样一次性完成所有操作。但我肯定会先与您的 DBA 讨论一下,因为他们可能希望回收您在清除后不再使用的块。此外,可能存在从用户角度来看通常不可见的调度问题。

回答by Gary Myers

If your original SQL takes a very long time, some concurrent SQLs may run slowly as they have to use UNDO to rebuild a version of the data without your uncommitted changes.

如果您的原始 SQL 需要很长时间,一些并发 SQL 可能会运行缓慢,因为它们必须使用 UNDO 来重建数据的一个版本,而没有您未提交的更改。

A compromise may be something like

妥协可能是这样的

FOR i in 1..100 LOOP
  DELETE FROM table_name WHERE id IN (SELECT id FROM temp_table) AND ROWNUM < 100000;
  EXIT WHEN SQL%ROWCOUNT = 0;
  COMMIT;
END LOOP;

You can adjust ROWNUM as required. A smaller ROWNUM means more frequent commits and (probably) reduced impact on other sessions in terms of needing to apply undo. However, depending on execution plans, there may be other impacts and it will probably take more time overall. Technically the 'FOR' part of the loop is unnecessary as the EXIT will end the loop. But I'm paranoid about unlimited loops as it is a pain to kill the session if they do get stuck.

您可以根据需要调整 ROWNUM。较小的 ROWNUM 意味着更频繁的提交和(可能)在需要应用撤消方面减少对其他会话的影响。但是,根据执行计划,可能会有其他影响,并且总体上可能需要更多时间。从技术上讲,循环的“FOR”部分是不必要的,因为 EXIT 将结束循环。但是我对无限循环很偏执,因为如果他们卡住了,终止会话是很痛苦的。

回答by WW.

I would recommend running this as a single delete.

我建议将其作为单个删除运行。

Are there any child tables of the one you are deleting from? If so, make sure the foreign key in those tables is indexed. Otherwise, you might do a full scan of the child table for every row you delete which could make things very slow.

您要从中删除的表是否有任何子表?如果是这样,请确保这些表中的外键已编入索引。否则,您可能会为您删除的每一行对子表进行全面扫描,这可能会使事情变得非常缓慢。

You might want some ways to check the progress of the delete as it runs. See How to check oracle database for long running queries?

您可能需要一些方法来检查删除运行的进度。请参阅如何检查 oracle 数据库中的长时间运行的查询?

As other people have suggested, if you want to test the water, you can put: rownum < 10000 on the end of your query.

正如其他人所建议的,如果你想试水,你可以在查询的末尾加上:rownum < 10000。

回答by Mark Nold

I've done something similar in the past with Oracle 7, where i had to delete millions of rows from thousands of tables. For all round performance and especially the large deletes (million rows plus in one table) this script worked well.

我过去在 Oracle 7 中做过类似的事情,我不得不从数千个表中删除数百万行。对于全面的性能,尤其是大删除(百万行加上一张表),这个脚本运行良好。

You'll have to modify it slightly (ie: examine the users/passwords, plus get your rollback segments right). Also you really need to discuss this with your DBA and run it in a TEST environment first. Having said all of that, it's pretty easy. The function delete_sql()looks up a batch of rowids in the table you specify then deletes them batch by batch. For example;

您必须稍微修改它(即:检查用户/密码,加上正确的回滚段)。此外,您确实需要与您的 DBA 讨论这个问题,并首先在 TEST 环境中运行它。说了这么多,其实很简单。函数delete_sql()在您指定的表中查找一批 rowid,然后逐批删除它们。例如;

exec delete_sql('MSF710', 'select rowid from msf710 s where  (s.equip_no, s.eq_tran_date, s.comp_data, s.rec_710_type, s.seq_710_no) not in  (select c.equip_no, c.eq_tran_date, c.comp_data, c.rec_710_type, c.seq_710_no  from  msf710_sched_comm c)', 500);

The above example is deleting 500 records at a time from table MSF170 based on an sql statement.

上面的例子是基于一条sql语句从MSF170表中一次删除500条记录。

If you need to delete data from multiple tables, just include additional exec delete_sql(...)lines in the file delete-tables.sql

如果您需要从多个表中删除数据,只需exec delete_sql(...)在文件 delete-tables.sql 中添加额外的行

Oh and remember to put your rollback segments back online, it's not in the script.

哦,记住将回滚段重新联机,它不在脚本中。

spool delete-tables.log;
connect system/SYSTEM_PASSWORD
alter rollback segment r01 offline;
alter rollback segment r02 offline;
alter rollback segment r03 offline;
alter rollback segment r04 offline;

connect mims_3015/USER_PASSWORD

CREATE OR REPLACE PROCEDURE delete_sql (myTable in VARCHAR2, mySql in VARCHAR2, commit_size in number) is
  i           INTEGER;
  sel_id      INTEGER;
  del_id      INTEGER;
  exec_sel    INTEGER;
  exec_del    INTEGER;
  del_rowid   ROWID;

  start_date  DATE;
  end_date    DATE;
  s_date      VARCHAR2(1000);
  e_date      VARCHAR2(1000);
  tt          FLOAT;
  lrc         integer;


BEGIN
  --dbms_output.put_line('SQL is ' || mySql);
  i := 0;
  start_date:= SYSDATE;
  s_date:=TO_CHAR(start_date,'DD/MM/YY HH24:MI:SS');


  --dbms_output.put_line('Deleting ' || myTable);
  sel_id := DBMS_SQL.OPEN_CURSOR;
  DBMS_SQL.PARSE(sel_id,mySql,dbms_sql.v7);
  DBMS_SQL.DEFINE_COLUMN_ROWID(sel_id,1,del_rowid);
  exec_sel := DBMS_SQL.EXECUTE(sel_id);
  del_id := DBMS_SQL.OPEN_CURSOR;
  DBMS_SQL.PARSE(del_id,'delete from ' || myTable || ' where rowid = :del_rowid',dbms_sql.v7);
 LOOP
   IF DBMS_SQL.FETCH_ROWS(sel_id) >0 THEN
      DBMS_SQL.COLUMN_VALUE(sel_id,1,del_rowid);
      lrc := dbms_sql.last_row_count;
      DBMS_SQL.BIND_VARIABLE(del_id,'del_rowid',del_rowid);
      exec_del := DBMS_SQL.EXECUTE(del_id);

      -- you need to get the last_row_count earlier as it changes.
      if mod(lrc,commit_size) = 0 then
        i := i + 1;
        --dbms_output.put_line(myTable || ' Commiting Delete no ' || i || ', Rowcount : ' || lrc);
        COMMIT;
      end if;
   ELSE 
       exit;
   END IF;
 END LOOP;
  i := i + 1;
  --dbms_output.put_line(myTable || ' Final Commiting Delete no ' || i || ', Rowcount : ' || dbms_sql.last_row_count);
  COMMIT;
  DBMS_SQL.CLOSE_CURSOR(sel_id);
  DBMS_SQL.CLOSE_CURSOR(del_id);

  end_date := SYSDATE;
  e_date := TO_CHAR(end_date,'DD/MM/YY HH24:MI:SS');
  tt:= trunc((end_date - start_date) * 24 * 60 * 60,2);
  dbms_output.put_line('Deleted ' || myTable || ' Time taken is ' || tt || 's from ' || s_date || ' to ' || e_date || ' in ' || i || ' deletes and Rows = ' || dbms_sql.last_row_count);

END;
/

CREATE OR REPLACE PROCEDURE delete_test (myTable in VARCHAR2, mySql in VARCHAR2, commit_size in number) is
  i integer;
  start_date DATE;
  end_date DATE;
  s_date VARCHAR2(1000);
  e_date VARCHAR2(1000);
  tt FLOAT;
BEGIN
  start_date:= SYSDATE;
  s_date:=TO_CHAR(start_date,'DD/MM/YY HH24:MI:SS');
  i := 0;
  i := i + 1;
  dbms_output.put_line(i || ' SQL is ' || mySql);
  end_date := SYSDATE;
  e_date := TO_CHAR(end_date,'DD/MM/YY HH24:MI:SS');
  tt:= round((end_date - start_date) * 24 * 60 * 60,2);
  dbms_output.put_line(i || ' Time taken is ' || tt || 's from ' || s_date || ' to ' || e_date);
END;
/

show errors procedure delete_sql
show errors procedure delete_test

SET SERVEROUTPUT ON FORMAT WRAP SIZE 200000; 

exec delete_sql('MSF710', 'select rowid from msf710 s where  (s.equip_no, s.eq_tran_date, s.comp_data, s.rec_710_type, s.seq_710_no) not in  (select c.equip_no, c.eq_tran_date, c.comp_data, c.rec_710_type, c.seq_710_no  from  msf710_sched_comm c)', 500);






spool off;

Oh and one last tip. It's going to be slow and depending on the table may require some downtime. Testing, timing and tuning are your best friend here.

哦,还有最后一个提示。它会很慢,并且可能需要一些停机时间,具体取决于表。测试、计时和调整是您最好的朋友。

回答by Evan

All of the answers here are great, just one thing to add: if you want to delete allof the records in a table, and are sureyou won't need to rollback, then you want to use the truncate tablecommand.

这里的所有答案都很棒,只需添加一件事:如果您想删除表中的所有记录,并且确定不需要回滚,那么您想使用truncate table命令。

(In your case, you only wanted to delete a subset, but for anybody lurking with a similar problem, I thought I'd add this)

(在您的情况下,您只想删除一个子集,但对于潜伏在类似问题中的任何人,我想我会添加这个)

回答by sandeep

The easiest way to me is:-

对我来说最简单的方法是:-

DECLARE
L_exit_flag VARCHAR2(2):='N';
L_row_count NUMBER:= 0;

BEGIN
   :exit_code        :=0;
   LOOP
      DELETE table_name
       WHERE condition(s) AND ROWNUM <= 200000;
       L_row_count := L_row_count + SQL%ROWCOUNT;
       IF SQL%ROWCOUNT = 0 THEN
          COMMIT;
          :exit_code :=0;
          L_exit_flag := 'Y';
       END IF;
      COMMIT;
      IF L_exit_flag = 'Y'
      THEN
         DBMS_OUTPUT.PUT_LINE ('Finally Number of Records Deleted : '||L_row_count);
         EXIT;
      END IF;
   END LOOP;
   --DBMS_OUTPUT.PUT_LINE ('Finally Number of Records Deleted : '||L_row_count);
EXCEPTION
   WHEN OTHERS THEN
      ROLLBACK;
      DBMS_OUTPUT.PUT_LINE ('Error Code: '||SQLCODE);
      DBMS_OUTPUT.PUT_LINE ('Error Message: '||SUBSTR (SQLERRM, 1, 240));
      :exit_code := 255;
END;