如何防止将重复行加载到 Oracle 表中?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1588588/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 19:14:37  来源:igfitidea点击:

How do I prevent the loading of duplicate rows in to an Oracle table?

oracleduplicates

提问by user181903

I have some large tables (millions of rows). I constantly receive files containing new rows to add in to those tables - up to 50 million rows per day. Around 0.1% of the rows I receive are duplicates of rows I have already loaded (or are duplicates within the files). I would like to prevent those rows being loaded in to the table.

我有一些大表(数百万行)。我不断收到包含要添加到这些表中的新行的文件 - 每天最多 5000 万行。我收到的大约 0.1% 的行是我已经加载的行的重复项(或者是文件中的重复项)。我想防止将这些行加载到表中。

I currently use SQLLoader in order to have sufficient performance to cope with my large data volume. If I take the obvious step and add a unique index on the columns which goven whether or not a row is a duplicate, SQLLoader will start to fail the entire file which contains the duplicate row - whereas I only want to prevent the duplicate row itself being loaded.

我目前使用 SQL Loader 是为了有足够的性能来处理我的大数据量。如果我采取明显的步骤并在列上添加一个唯一索引来确定行是否重复,SQLLoader 将开始使包含重复行的整个文件失败 - 而我只想防止重复行本身正在加载。

I know that in SQL Server and Sybase I can create a unique index with the 'Ignore Duplicates' property and that if I then use BCP the duplicate rows (as defined by that index) will simply not be loaded.

我知道在 SQL Server 和 Sybase 中,我可以使用“忽略重复项”属性创建一个唯一索引,并且如果我然后使用 BCP,将不会加载重复的行(由该索引定义)。

Is there some way to achieve the same effect in Oracle?

有什么办法可以在oracle中达到同样的效果吗?

I do not want to remove the duplicate rows once they have been loaded - it's important to me that they should never be loaded in the first place.

我不想在加载重复的行后删除它们 - 对我来说重要的是它们不应该首先被加载。

回答by Adam Hawkes

What do you mean by "duplicate"? If you have a column which defines a unique row you should setup a unique constraint against that column. One typically creates a unique index on this column, which will automatically setup the constraint.

你说的“重复”是什么意思?如果您有一列定义了唯一行,则应针对该列设置唯一约束。通常在此列上创建一个唯一索引,它将自动设置约束。

EDIT: Yes, as commented below you should setup a "bad" file for SQL*Loader to capture invalid rows. But I think that establishing the unique index is probably a good idea from a data-integrity standpoint.

编辑:是的,如下所述,您应该为 SQL*Loader 设置一个“坏”文件以捕获无效行。但我认为从数据完整性的角度来看,建立唯一索引可能是一个好主意。

回答by C?t?lin Piti?

Use Oracle MERGE statement. Some explanations here.

使用 Oracle MERGE 语句。这里有一些解释。

回答by Guru

You dint inform about what release of Oracle you have. Have a look at there for merge command.

您不能告知您拥有的 Oracle 版本。看看那里的合并命令

Basically like this

基本上是这样的

---- Loop through all the rows from a record temp_emp_rec
MERGE INTO hr.employees e
     USING temp_emp_rec t
     ON (e.emp_ID = t.emp_ID)
     WHEN MATCHED THEN
    --- _You can update_
    UPDATE
     SET first_name = t.first_name,
          last_name = t.last_name
    --- _Insert into the table_
    WHEN NOT MATCHED THEN
    INSERT (emp_id, first_name, last_name)
    VALUES (t.emp_id, t.first_name, t.last_name); 

回答by Ian Carpenter

I would use integrity constraints defined on the appropriate table columns.

我将使用在适当的表列上定义的完整性约束。

This pagefrom the Oracle concepts manual gives an overview, if you also scroll down you will see what types of constraints are available.

Oracle 概念手册中的此页面提供了概述,如果您还向下滚动,您将看到可用的约束类型。

回答by abhishek Kumar

use below option, if you will get this much error 9999999 after that your sqlldr will terminate.

使用下面的选项,如果您在 sqlldr 将终止之后会收到这么多错误 9999999。

OPTIONS (ERRORS=9999999, DIRECT=FALSE )
LOAD DATA

you will get duplicate records in bad file.

你会在坏文件中得到重复的记录。

sqlldr user/password@schema CONTROL=file.ctl, LOG=file.log, BAD=file.bad