Oracle 数据库旧数据删除技术

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/981762/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 18:23:16  来源:igfitidea点击:

Techniques for removing old data on Oracle databases

oraclepartitioningpurge

提问by Steve Broberg

We have a mature Oracle database application (in production for over 10 years), and during that time, we have been using scripts of our own devising to remove old data that is no longer needed. They work by issuing delete statements against the appropriate tables, in a loop with frequent commits, in order to avoid overloading the system with i/o or using too much undo space.

我们有一个成熟的 Oracle 数据库应用程序(已投入生产超过 10 年),在此期间,我们一直在使用我们自己设计的脚本来删除不再需要的旧数据。它们的工作方式是针对适当的表发出删除语句,在频繁提交的循环中,以避免系统因 i/o 过载或使用过多的撤消空间。

They work fine, for the most part. They run daily, and it takes about an hour to remove the oldest days worth of data from the system. The main concerns I have are the effects on tables and indexes that all this deleting may have, and the fact that even though they don't overly load the system, deleting one day's worth of data in that short time does have the effect of blowing out the instances buffer cache, resulting in subsequent queries running slightly slower for the next few hours as the cache is gradually restored.

在大多数情况下,它们运行良好。它们每天运行,从系统中删除最旧日期的数据大约需要一个小时。我主要担心的是所有这些删除可能对表和索引产生的影响,事实上,即使它们不会过度加载系统,在这么短的时间内删除一天的数据确实会产生影响出实例缓冲区缓存,导致后续查询在接下来的几个小时内随着缓存逐渐恢复而运行速度略慢。

For years we've been considering better methods. In the past, I had heard that people used partitioned tables to manage old data reaping - one month per partition, for example, and dropping the oldest partition on a monthly basis. The main drawback to this approach is that our reaping rules go beyond "remove month X". Users are allowed to specify how long data must stay in the system, based on key values (e.g., in an invoice table, account foo can be removed after 3 months, but account bar may need to remain for 2 years).

多年来,我们一直在考虑更好的方法。过去,我听说人们使用分区表来管理旧数据收集——例如,每个分区一个月,并每月删除最旧的分区。这种方法的主要缺点是我们的收割规则超出了“删除第 X 个月”的范围。允许用户根据键值指定数据必须在系统中保留多长时间(例如,在发票表中,帐户 foo 可以在 3 个月后删除,但帐户 bar 可能需要保留 2 年)。

There is also the issue of referential integrity; Oracle documentation talks about using partitions for purging data mostly in the context of data warehouses, where tables tend to be hypercubes. Ours is closer to the OLTP end of things, and it is common for data in month X to have relationships to data in month Y. Creating the right partitioning keys for these tables would be ticklish at best.

还有参照完整性的问题;Oracle 文档主要在数据仓库的上下文中讨论使用分区来清除数据,其中表往往是超立方体。我们更接近于 OLTP 的结局,X 月的数据与 Y 月的数据有关系是很常见的。为这些表创建正确的分区键充其量是很棘手的。

As for the cache blowouts, I have read a bit about setting up dedicated buffer caches, but it seems like it's more on a per-table basis, as opposed to a per-user or per-transaction basis. To preserve the cache, I'd really like the reaping job to only keep one transaction's worth of data in the cache at any time, since there is no need to keep the data around once deleted.

至于缓存溢出,我已经阅读了一些关于设置专用缓冲区缓存的内容,但似乎更多的是基于每个表,而不是基于每个用户或每个事务。为了保留缓存,我真的希望收割作业在任何时候都只在缓存中保留一个事务的数据,因为一旦删除就不需要保留数据。

Are we stuck using deletes for the foreseeable future, or are there other, more clever ways to deal with reaping?

在可预见的未来,我们是否坚持使用删除,或者还有其他更聪明的方法来处理收割?

回答by David Aldridge

For the most part I think that you're stuck doing deletes.

在大多数情况下,我认为您一直在执行删除操作。

Your comments on the difficulty of using partitions in your case probably do prevent them being used effectively (different delete dates being used depending on the type of record) but it it possible that you could create a "delete date" column on the records that you could partition on? It would have the disadvantage of making updates quite expensive as a change in the delete date might cause row migration, so your update would really be implemented as a delete and insert.

您对在您的案例中使用分区的困难的评论可能确实阻止了它们的有效使用(根据记录类型使用不同的删除日期),但您可以在您的记录上创建一个“删除日期”列可以分区吗?它的缺点是使更新变得非常昂贵,因为删除日期的更改可能会导致行迁移,因此您的更新实际上将作为删除和插入来实现。

It could be that even then you cannot use DDL partition operations to remove old data because of the referential integrity issues, but partitioning still might serve the purpose of physically clustering the rows to be deleted so that fewer blocks need to be modified in order to delete them, mitigating the impact on the buffer cache.

可能即使那样,由于引用完整性问题,您也无法使用 DDL 分区操作来删除旧数据,但分区仍然可以用于将要删除的行进行物理集群的目的,以便需要修改较少的块才能删除它们,减轻对缓冲区缓存的影响。

回答by kdgregory

Delete's aren't that bad, provided that you rebuild your indexes. Oracle will recover the pages that no longer contain data.

删除并没有那么糟糕,前提是您重建了索引。Oracle 将恢复不再包含数据的页面。

However, as-of 8i (and quite probably still), it would not properly recover index pages that no longer contained valid references. Worse, since the index leaves were chained, you could get into a situation where it would start walking the leaf nodes to find a row. This would cause a rather significant drop in performance: queries that would normally take seconds could take minutes. The drop was also very sudden: one day it would be fine, the next day it wouldn't.

但是,从 8i 开始(并且很可能仍然如此),它无法正确恢复不再包含有效引用的索引页。更糟糕的是,由于索引叶子是链接的,您可能会遇到一种情况,它会开始遍历叶子节点以查找行。这会导致性能显着下降:通常需要几秒钟的查询可能需要几分钟。下降也非常突然:有一天会好,第二天就不行。

I discovered this behavior (there was an Oracle bug for it, so other people have too) with an application that used increasing keys and regularly deleted data. Our solution was to invert portions of the key, but that's not going to help you with dates.

我在一个使用递增键和定期删除数据的应用程序中发现了这种行为(它有一个 Oracle 错误,所以其他人也有)。我们的解决方案是反转密钥的一部分,但这对您的日期没有帮助。

回答by C?t?lin Piti?

What if you temporarily deactivate indexes, perform the deletes and then rebuild them? Would it improve the performance of your deletes? Of course, in this case you have to make sure the scripts are correct and ensure proper delete order and referential integrity.

如果您暂时停用索引,执行删除然后重建它们会怎样?它会提高删除的性能吗?当然,在这种情况下,您必须确保脚本正确并确保正确的删除顺序和参照完整性。

回答by slovon

We have the same problem, using the same strategy. If the situation becomes really bad (very fragmented allocation of indexes, tables, ...), we try to apply space reclamation actions.

我们有同样的问题,使用同样的策略。如果情况变得非常糟糕(索引、表的分配非常分散,...),我们会尝试应用空间回收操作。

Tables have to allow row movement (like for the flashback): alter table TTT enable row movement; alter table TTT shrink space; and then rebuild all indexes.

表必须允许行移动(如闪回):alter table TTT 启用行移动;更改表 TTT 收缩空间;然后重建所有索引。

I don't know how you are with maintenance windows, if the application has to be usable all the time, it is harder, if not, you can do some "repacking" when it is off-line. "alter table TTT move tablespace SSSS" does a lot of work cleaning up the mess as the table is rewritten. You can also specify new storage parameters such as extent management, sizes, ... take a look in the docs.

不知道你的维护窗口怎么样,如果应用程序必须一直可用,那就更难了,如果不行,你可以在离线时做一些“重新打包”。“alter table TTT move tablespace SSSS”在重写表时做了很多清理工作。您还可以指定新的存储参数,例如范围管理、大小……查看文档。

I use a script like this to create a script for the whole database:

我使用这样的脚本为整个数据库创建脚本:

SET SQLPROMPT "-- "
SET ECHO OFF
SET NEWPAGE 0
SET SPACE 0
SET PAGESIZE 0
SET FEEDBACK OFF
SET HEADING OFF
SET TRIMSPOOL ON
SET TERMOUT OFF
SET VERIFY OFF
SET TAB OFF
spool doit.sql
select 'prompt Enabling row movement in '||table_name||'...'||CHR (10)||'alter table '||table_name||' enable row movement;' from user_tables where table_name not like '%$%' and table_name not like '%QTAB' and table_name not like 'SYS_%';
select 'prompt Setting initial ext for '||table_name||'...'||CHR (10)||'alter table '||table_name||' move storage (initial 1m);' from user_tables where table_name not like '%$%' and table_name not like '%QTAB' and table_name not like 'SYS_%';
select 'prompt Shrinking space for '||table_name||'...'||CHR (10)||'alter table '||table_name||' shrink space;' from user_tables where table_name not like '%$%' and table_name not like '%QTAB' and table_name not like 'SYS_%';
select 'prompt Rebuilding index '||index_name||'...'||CHR (10)||'alter index '||index_name||' rebuild;' from user_indexes where status = 'UNUSABLE';
spool off
prompt now check and then run @doit.sql
exit