将大量数据加载到 Oracle SQL 数据库

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2992456/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 20:46:14  来源:igfitidea点击:

Loading large amounts of data to an Oracle SQL Database

sqloracleinsertbulkinsertsql-loader

提问by James

I was wondering if anyone had any experience with what I am about to embark on. I have several csv files which are all around a GB or so in size and I need to load them into a an oracle database. While most of my work after loading will be read-only I will need to load updates from time to time. Basically I just need a good tool for loading several rows of data at a time up to my db.

我想知道是否有人对我即将开始的工作有任何经验。我有几个 csv 文件,大小都在 GB 左右,我需要将它们加载到 oracle 数据库中。虽然加载后我的大部分工作都是只读的,但我需要不时加载更新。基本上我只需要一个很好的工具来一次加载几行数据到我的数据库。

Here is what I have found so far:

这是我迄今为止发现的:

  1. I could use SQL Loader t do a lot of the work

  2. I could use Bulk-Insert commands

  3. Some sort of batch insert.

  1. 我可以使用 SQL Loader 来做很多工作

  2. 我可以使用批量插入命令

  3. 某种批量插入。

Using prepared statement somehow might be a good idea. I guess I was wondering what everyone thinks is the fastest way to get this insert done. Any tips?

以某种方式使用准备好的语句可能是一个好主意。我想我想知道每个人都认为完成此插入的最快方法是什么。有小费吗?

回答by DCookie

I would be very surprised if you could roll your own utility that will outperform SQL*Loader Direct Path Loads. Oracle built this utility for exactly this purpose - the likelihood of building something more efficient is practically nil. There is also the Parallel Direct Path Load, which allows you to have multiple direct path load processes running concurrently.

如果您可以推出自己的实用程序,使其性能优于SQL*Loader Direct Path Loads,我会感到非常惊讶。Oracle 正是为此目的构建了这个实用程序——构建更高效的东西的可能性几乎为零。还有Parallel Direct Path Load,它允许您同时运行多个直接路径加载进程。

From the manual:

从手册:

Instead of filling a bind array buffer and passing it to the Oracle database with a SQL INSERT statement, a direct path load uses the direct path API to pass the data to be loaded to the load engine in the server. The load engine builds a column array structure from the data passed to it.

The direct path load engine uses the column array structure to format Oracle data blocks and build index keys. The newly formatted database blocks are written directly to the database (multiple blocks per I/O request using asynchronous writes if the host platform supports asynchronous I/O).

Internally, multiple buffers are used for the formatted blocks. While one buffer is being filled, one or more buffers are being written if asynchronous I/O is available on the host platform. Overlapping computation with I/O increases load performance.

与填充绑定数组缓冲区并使用 SQL INSERT 语句将其传递到 Oracle 数据库不同,直接路径加载使用直接路径 API 将要加载的数据传递到服务器中的加载引擎。加载引擎根据传递给它的数据构建一个列数组结构。

直接路径加载引擎使用列数组结构来格式化 Oracle 数据块并构建索引键。新格式化的数据库块直接写入数据库(如果主机平台支持异步 I/O,则每个 I/O 请求使用异步写入多个块)。

在内部,多个缓冲区用于格式化块。在填充一个缓冲区时,如果主机平台上有可用的异步 I/O,则会写入一个或多个缓冲区。与 I/O 重叠计算可提高负载性能。

There are cases where Direct Path Load cannot be used.

有些情况下不能使用直接路径加载。

回答by jim mcnamara

With that amount of data, you'd better be sure of your backing store - the dbf disks' free space.

有了这么多数据,您最好确定您的后备存储 - dbf 磁盘的可用空间。

sqlldr is script drive, very efficient, generally more efficient than a sql script. The only thing I wonder about is the magnitude of the data. I personally would consider several to many sqlldr processes and assign each one a subset of data and let the processes run in parallel.

sqlldr是脚本驱动,效率很高,一般比sql脚本效率更高。我唯一想知道的是数据的大小。我个人会考虑几个到多个 sqlldr 进程,并为每个进程分配一个数据子集,让这些进程并行运行。

You said you wanted to load a few records at a time? That may take a lot longer than you think. Did you mean a few files at a time?

你说你想一次加载几条记录?这可能需要比你想象的更长的时间。你是说一次几个文件?

回答by Stellios

You may be able to create an external table on the CSV files and load them in by SELECTing from the external table into another table. Whether this method will be quicker not sure however might be quicker in terms of messing around getting sql*loader to work especially when you have a criteria for UPDATEs.

您可以在 CSV 文件上创建一个外部表,并通过从外部表中选择到另一个表来加载它们。这种方法是否会更快不确定但是可能会更快让 sql*loader 工作,尤其是当您有更新标准时。