SQL 使用 .NET 批量插入到 Oracle
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/343299/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Bulk Insert to Oracle using .NET
提问by Salamander2007
What is the fastest way to do Bulk insert to Oracle using .NET? I need to transfer about 160K records using .NET to Oracle. Currently, I'm using insert statement and execute it 160K times.It takes about 25 minutes to complete. The source data is stored in a DataTable, as a result of query from another database (MySQL),
使用 .NET 对 Oracle 进行批量插入的最快方法是什么?我需要使用 .NET 将大约 160K 记录传输到 Oracle。目前,我正在使用插入语句并执行 160K 次。大约需要 25 分钟才能完成。源数据存储在 DataTable 中,作为来自另一个数据库 (MySQL) 的查询结果,
Is there any better way to do this?
有没有更好的方法来做到这一点?
EDIT: I'm currently using System.Data.OracleClient, but willing to accept solutions using another provider (ODP.NET, DevArt, etc..)
编辑:我目前正在使用 System.Data.OracleClient,但愿意接受使用其他提供商(ODP.NET、DevArt 等)的解决方案
回答by Damian
I'm loading 50,000 records in 15 or so seconds using Array Binding in ODP.NET
我正在使用 ODP.NET 中的数组绑定在 15 秒左右的时间内加载 50,000 条记录
It works by repeatedly invoking a stored procedure you specify (and in which you can do updates/inserts/deletes), but it passes the multiple parameter values from .NET to the database in bulk.
它通过重复调用您指定的存储过程(并且您可以在其中执行更新/插入/删除)来工作,但它会将多个参数值从 .NET 批量传递到数据库。
Instead of specifying a single value for each parameter to the stored procedure you specify an arrayof values for each parameter.
不是为存储过程的每个参数指定一个值,而是为每个参数指定一个值数组。
Oracle passes the parameter arrays from .NET to the database in one go, and then repeatedly invokes the stored procedure you specify using the parameter values you specified.
Oracle 将参数数组从 .NET 一次性传递到数据库,然后使用您指定的参数值重复调用您指定的存储过程。
http://www.oracle.com/technetwork/issue-archive/2009/09-sep/o59odpnet-085168.html
http://www.oracle.com/technetwork/issue-archive/2009/09-sep/o59odpnet-085168.html
/Damian
/达米安
回答by Tarik
I recently discovered a specialized class that's awesome for a bulk insert (ODP.NET). Oracle.DataAccess.Client.OracleBulkCopy! It takes a datatable as a parameter, then you call WriteTOServer method...it is very fast and effective, good luck!!
我最近发现了一个非常适合批量插入的专门类 (ODP.NET)。Oracle.DataAccess.Client.OracleBulkCopy!它以数据表为参数,然后调用WriteTOServer方法...非常快速有效,祝你好运!!
回答by Theo
The solution of Rob Stevenson-Legget is slow because he doesn't bind his values but he uses string.Format( ).
Rob Stevenson-Legget 的解决方案很慢,因为他没有绑定他的值,而是使用 string.Format()。
When you ask Oracle to execute a sql statement it starts with calculating the has value of this statement. After that it looks in a hash table whether it already knows this statement. If it already knows it statement it can retrieve its execution path from this hash table and execute this statement really fast because Oracle has executed this statement before. This is called the library cache and it doesn't work properly if you don't bind your sql statements.
当您要求 Oracle 执行 sql 语句时,它首先计算该语句的 has 值。之后它会在哈希表中查看它是否已经知道这个语句。如果它已经知道它的语句,它可以从这个哈希表中检索它的执行路径并非常快速地执行这个语句,因为 Oracle 之前已经执行过这个语句。这称为库缓存,如果您不绑定 sql 语句,它将无法正常工作。
For example don't do:
例如不要这样做:
int n;
国际n;
for (n = 0; n < 100000; n ++)
{
mycommand.CommandText = String.Format("INSERT INTO [MyTable] ([MyId]) VALUES({0})", n + 1);
mycommand.ExecuteNonQuery();
}
but do:
但做:
OracleParameter myparam = new OracleParameter();
int n;
mycommand.CommandText = "INSERT INTO [MyTable] ([MyId]) VALUES(?)";
mycommand.Parameters.Add(myparam);
for (n = 0; n < 100000; n ++)
{
myparam.Value = n + 1;
mycommand.ExecuteNonQuery();
}
Not using parameters can also cause sql injection.
不使用参数也会导致sql注入。
回答by Berend Engelbrecht
SQL Server's SQLBulkCopy is blindingly fast. Unfortunately, I found that OracleBulkCopy is far slower. Also it has problems:
SQL Server 的 SQLBulkCopy 非常快。不幸的是,我发现 OracleBulkCopy 的速度要慢得多。它也有问题:
- You must be very sure that your input data is clean if you plan to use OracleBulkCopy. If a primary key violation occurs, an ORA-26026 is raised and it appears to be unrecoverable. Trying to rebuild the index does not help and any subsequent insert on the table fails, also normal inserts.
- Even if the data is clean, I found that OracleBulkCopy sometimes gets stuck inside WriteToServer. The problem seems to depend on the batch size. In my test data, the problem would happen at the exact same point in my test when I repeat is. Use a larger or smaller batch size, and the problem does not happen. I see that the speed is more irregular on larger batch sizes, this points to problems related to memory management.
- 如果您计划使用 OracleBulkCopy,您必须非常确定您的输入数据是干净的。如果发生主键冲突,则会引发 ORA-26026 并且它似乎无法恢复。尝试重建索引无济于事,表上的任何后续插入都会失败,正常插入也是如此。
- 即使数据是干净的,我发现 OracleBulkCopy 有时也会卡在 WriteToServer 中。问题似乎取决于批量大小。在我的测试数据中,当我重复时,问题将发生在我测试中完全相同的点。使用更大或更小的批量大小,问题不会发生。我发现在更大的批量上速度更不规则,这表明与内存管理相关的问题。
Actually System.Data.OracleClient.OracleDataAdapter is faster than OracleBulkCopy if you want to fill a table with small records but many rows. You need to tune the batch size though, the optimum BatchSize for OracleDataAdapter is smaller than for OracleBulkCopy.
实际上 System.Data.OracleClient.OracleDataAdapter 比 OracleBulkCopy 更快,如果你想用小记录但很多行填充表。不过,您需要调整批量大小,OracleDataAdapter 的最佳 BatchSize 小于 OracleBulkCopy。
I ran my test on a Windows 7 machine with an x86 executable and the 32 bits ODP.Net client 2.112.1.0. . The OracleDataAdapter is part of System.Data.OracleClient 2.0.0.0. My test set is about 600,000 rows with a record size of max. 102 bytes (average size 43 chars). Data source is a 25 MB text file, read in line by line as a stream.
我在带有 x86 可执行文件和 32 位 ODP.Net 客户端 2.112.1.0 的 Windows 7 机器上运行我的测试。. OracleDataAdapter 是 System.Data.OracleClient 2.0.0.0 的一部分。我的测试集大约有 600,000 行,最大记录大小。102 字节(平均大小为 43 个字符)。数据源是一个 25 MB 的文本文件,作为流逐行读取。
In my test I built up the input data table to a fixed table size and then used either OracleBulkCopy or OracleDataAdapter to copy the data block to the server. I left BatchSize as 0 in OracleBulkCopy (so that the current table contents is copied as one batch) and set it to the table size in OracleDataAdapter (again that should create a single batch internally). Best results:
在我的测试中,我将输入数据表构建为固定的表大小,然后使用 OracleBulkCopy 或 OracleDataAdapter 将数据块复制到服务器。我在 OracleBulkCopy 中将 BatchSize 保留为 0(以便将当前表内容作为一个批次复制)并将其设置为 OracleDataAdapter 中的表大小(同样应该在内部创建一个批次)。最佳结果:
- OracleBulkCopy: table size = 500, total duration 4'22"
- OracleDataAdapter: table size = 100, total duration 3'03"
- OracleBulkCopy:表大小 = 500,总持续时间 4'22"
- OracleDataAdapter:表大小 = 100,总持续时间 3'03"
For comparison:
比较:
- SqlBulkCopy: table size = 1000, total duration 0'15"
- SqlDataAdapter: table size = 1000, total duration 8'05"
- SqlBulkCopy:表大小 = 1000,总持续时间 0'15"
- SqlDataAdapter:表大小 = 1000,总持续时间 8'05"
Same client machine, test server is SQL Server 2008 R2. For SQL Server, bulk copy is clearly the best way to go. Not only is it overall fastest, but server load is also lower than when using data adapter. It is a pity that OracleBulkCopy does not offer quite the same experience - the BulkCopy API is much easier to use than DataAdapter.
相同的客户端机器,测试服务器是 SQL Server 2008 R2。对于 SQL Server,批量复制显然是最好的方法。它不仅总体上是最快的,而且服务器负载也低于使用数据适配器时的负载。遗憾的是 OracleBulkCopy 没有提供完全相同的体验 - BulkCopy API 比 DataAdapter 更容易使用。
回答by JoshL
Finding the linked examples somewhat confusing, I worked out some code that demonstrates a working array insert into a test table (jkl_test). Here's the table:
发现链接的示例有些令人困惑,我编写了一些代码来演示将工作数组插入到测试表 (jkl_test) 中。这是表:
create table jkl_test (id number(9));
Here is .Net code for a simple Console application that connects to Oracle using ODP.Net and inserts an array of 5 integers:
这是一个简单的控制台应用程序的 .Net 代码,它使用 ODP.Net 连接到 Oracle 并插入一个包含 5 个整数的数组:
using Oracle.DataAccess.Client;
namespace OracleArrayInsertExample
{
class Program
{
static void Main(string[] args)
{
// Open a connection using ODP.Net
var connection = new OracleConnection("Data Source=YourDatabase; Password=YourPassword; User Id=YourUser");
connection.Open();
// Create an insert command
var command = connection.CreateCommand();
command.CommandText = "insert into jkl_test values (:ids)";
// Set up the parameter and provide values
var param = new OracleParameter("ids", OracleDbType.Int32);
param.Value = new int[] { 22, 55, 7, 33, 11 };
// This is critical to the process; in order for the command to
// recognize and bind arrays, an array bind count must be specified.
// Set it to the length of the array.
command.ArrayBindCount = 5;
command.Parameters.Add(param);
command.ExecuteNonQuery();
}
}
}
回答by Theo
A really fast way to solve this problem is to make a database link from the Oracle database to the MySQL database. You can create database links to non-Oracle databases. After you have created the database link you can retrieve your data from the MySQL database with a ... create table mydata as select * from ... statement. This is called heterogeneous connectivity. This way you don't have to do anything in your .net application to move the data.
解决这个问题的一个非常快速的方法是建立一个从 Oracle 数据库到 MySQL 数据库的数据库链接。您可以创建到非 Oracle 数据库的数据库链接。创建数据库链接后,您可以使用 ... create table mydata as select * from ... 语句从 MySQL 数据库中检索数据。这称为异构连接。这样您就不必在 .net 应用程序中执行任何操作来移动数据。
Another way is to use ODP.NET. In ODP.NET you can use the OracleBulkCopy-class.
另一种方法是使用 ODP.NET。在 ODP.NET 中,您可以使用 OracleBulkCopy 类。
But I don't think that inserting 160k records in an Oracle table with System.Data.OracleClient should take 25 minutes. I think you commit too many times. And do you bind your values to the insert statement with parameters or do you concatenate your values. Binding is much faster.
但我不认为使用 System.Data.OracleClient 在 Oracle 表中插入 160k 条记录应该需要 25 分钟。我觉得你犯了太多次了。并且您是将您的值绑定到带有参数的插入语句还是连接您的值。绑定要快得多。
回答by Neil
To follow up on Theo's suggestion with my findings (apologies - I don't currently have enough reputation to post this as a comment)
根据我的发现跟进 Theo 的建议(抱歉 - 我目前没有足够的声誉将此作为评论发布)
First, this is how to use several named parameters:
首先,这是如何使用几个命名参数:
String commandString = "INSERT INTO Users (Name, Desk, UpdateTime) VALUES (:Name, :Desk, :UpdateTime)";
using (OracleCommand command = new OracleCommand(commandString, _connection, _transaction))
{
command.Parameters.Add("Name", OracleType.VarChar, 50).Value = strategy;
command.Parameters.Add("Desk", OracleType.VarChar, 50).Value = deskName ?? OracleString.Null;
command.Parameters.Add("UpdateTime", OracleType.DateTime).Value = updated;
command.ExecuteNonQuery();
}
However, I saw no variation in speed between:
但是,我看到以下之间的速度没有变化:
- constructing a new commandString for each row (String.Format)
- constructing a now parameterized commandString for each row
- using a single commandString and changing the parameters
- 为每一行构造一个新的 commandString (String.Format)
- 为每一行构造一个现在参数化的 commandString
- 使用单个 commandString 并更改参数
I'm using System.Data.OracleClient, deleting and inserting 2500 rows inside a transaction
我正在使用 System.Data.OracleClient,在事务中删除和插入 2500 行
回答by S.Lott
Oracle says (http://www.oracle.com/technology/products/database/utilities/htdocs/sql_loader_overview.html)
Oracle 说(http://www.oracle.com/technology/products/database/utilities/htdocs/sql_loader_overview.html)
SQL*Loader is the primary method for quickly populating Oracle tables with data from external files
SQL*Loader 是使用来自外部文件的数据快速填充 Oracle 表的主要方法
My experience is that their loader loads their tables faster than anything else.
我的经验是他们的加载器加载表的速度比其他任何东西都快。
回答by 6opuc
If you are using unmanaged oracle client (Oracle.DataAccess) then the fastest way is to use OracleBulkCopy, as was pointed by Tarik.
如果您使用的是非托管 Oracle 客户端 (Oracle.DataAccess),那么最快的方法是使用 OracleBulkCopy,正如 Tarik 所指出的那样。
If you are using latest managed oracle client (Oracle.ManagedDataAccess) then the fastest way is to use array binding, as was pointed by Damien. If you wish keep your application code clean from array binding specifics, you could write your own implementation of OracleBulkCopy using array binding.
如果您使用的是最新的托管 oracle 客户端 (Oracle.ManagedDataAccess),那么最快的方法是使用数组绑定,正如 Damien 所指出的那样。如果您希望保持应用程序代码与数组绑定细节无关,您可以使用数组绑定编写自己的 OracleBulkCopy 实现。
Here is usage example from real project:
这是来自真实项目的使用示例:
var bulkWriter = new OracleDbBulkWriter();
bulkWriter.Write(
connection,
"BULK_WRITE_TEST",
Enumerable.Range(1, 10000).Select(v => new TestData { Id = v, StringValue=v.ToString() }).ToList());
10K records are inserted in 500ms!
10K 条记录在 500 毫秒内插入!
Here is implementation:
这是实现:
public class OracleDbBulkWriter : IDbBulkWriter
{
public void Write<T>(IDbConnection connection, string targetTableName, IList<T> data, IList<ColumnToPropertyMapping> mappings = null)
{
if (connection == null)
{
throw new ArgumentNullException(nameof(connection));
}
if (string.IsNullOrEmpty(targetTableName))
{
throw new ArgumentNullException(nameof(targetTableName));
}
if (data == null)
{
throw new ArgumentNullException(nameof(data));
}
if (mappings == null)
{
mappings = GetGenericMappings<T>();
}
mappings = GetUniqueMappings<T>(mappings);
Dictionary<string, Array> parameterValues = InitializeParameterValues<T>(mappings, data.Count);
FillParameterValues(parameterValues, data);
using (var command = CreateCommand(connection, targetTableName, mappings, parameterValues))
{
command.ExecuteNonQuery();
}
}
private static IDbCommand CreateCommand(IDbConnection connection, string targetTableName, IList<ColumnToPropertyMapping> mappings, Dictionary<string, Array> parameterValues)
{
var command = (OracleCommandWrapper)connection.CreateCommand();
command.ArrayBindCount = parameterValues.First().Value.Length;
foreach(var mapping in mappings)
{
var parameter = command.CreateParameter();
parameter.ParameterName = mapping.Column;
parameter.Value = parameterValues[mapping.Property];
command.Parameters.Add(parameter);
}
command.CommandText = $@"insert into {targetTableName} ({string.Join(",", mappings.Select(m => m.Column))}) values ({string.Join(",", mappings.Select(m => $":{m.Column}")) })";
return command;
}
private IList<ColumnToPropertyMapping> GetGenericMappings<T>()
{
var accessor = TypeAccessor.Create(typeof(T));
var mappings = accessor.GetMembers()
.Select(m => new ColumnToPropertyMapping(m.Name, m.Name))
.ToList();
return mappings;
}
private static IList<ColumnToPropertyMapping> GetUniqueMappings<T>(IList<ColumnToPropertyMapping> mappings)
{
var accessor = TypeAccessor.Create(typeof(T));
var members = new HashSet<string>(accessor.GetMembers().Select(m => m.Name));
mappings = mappings
.Where(m => m != null && members.Contains(m.Property))
.GroupBy(m => m.Column)
.Select(g => g.First())
.ToList();
return mappings;
}
private static Dictionary<string, Array> InitializeParameterValues<T>(IList<ColumnToPropertyMapping> mappings, int numberOfRows)
{
var values = new Dictionary<string, Array>(mappings.Count);
var accessor = TypeAccessor.Create(typeof(T));
var members = accessor.GetMembers().ToDictionary(m => m.Name);
foreach(var mapping in mappings)
{
var member = members[mapping.Property];
values[mapping.Property] = Array.CreateInstance(member.Type, numberOfRows);
}
return values;
}
private static void FillParameterValues<T>(Dictionary<string, Array> parameterValues, IList<T> data)
{
var accessor = TypeAccessor.Create(typeof(T));
for (var rowNumber = 0; rowNumber < data.Count; rowNumber++)
{
var row = data[rowNumber];
foreach (var pair in parameterValues)
{
Array parameterValue = pair.Value;
var propertyValue = accessor[row, pair.Key];
parameterValue.SetValue(propertyValue, rowNumber);
}
}
}
}
NOTE: this implementation uses Fastmember package for optimized access to properties(much faster than reflection)
注意:此实现使用 Fastmember 包来优化对属性的访问(比反射快得多)
回答by bernd_k
I guess that OracleBulkCopy is one of the fastest ways. I had some trouble to learn, that I needed a new ODAC version. Cf. Where is type [Oracle.DataAccess.Client.OracleBulkCopy] ?
我猜 OracleBulkCopy 是最快的方法之一。我在学习时遇到了一些麻烦,我需要一个新的 ODAC 版本。参见 类型 [Oracle.DataAccess.Client.OracleBulkCopy] 在哪里?
Here is the complete PowerShell code to copy from a query into a suited existing Oracle table. I tried Sql-Server a datasource, but other valid OLE-DB sources will go to.
这是从查询复制到合适的现有 Oracle 表的完整 PowerShell 代码。我试过 Sql-Server 一个数据源,但其他有效的 OLE-DB 源会去。
if ($ora_dll -eq $null)
{
"Load Oracle dll"
$ora_dll = [System.Reflection.Assembly]::LoadWithPartialName("Oracle.DataAccess")
$ora_dll
}
# sql-server or Oracle source example is sql-server
$ConnectionString ="server=localhost;database=myDatabase;trusted_connection=yes;Provider=SQLNCLI10;"
# Oracle destination
$oraClientConnString = "Data Source=myTNS;User ID=myUser;Password=myPassword"
$tableName = "mytable"
$sql = "select * from $tableName"
$OLEDBConn = New-Object System.Data.OleDb.OleDbConnection($ConnectionString)
$OLEDBConn.open()
$readcmd = New-Object system.Data.OleDb.OleDbCommand($sql,$OLEDBConn)
$readcmd.CommandTimeout = '300'
$da = New-Object system.Data.OleDb.OleDbDataAdapter($readcmd)
$dt = New-Object system.Data.datatable
[void]$da.fill($dt)
$OLEDBConn.close()
#Write-Output $dt
if ($dt)
{
try
{
$bulkCopy = new-object ("Oracle.DataAccess.Client.OracleBulkCopy") $oraClientConnString
$bulkCopy.DestinationTableName = $tableName
$bulkCopy.BatchSize = 5000
$bulkCopy.BulkCopyTimeout = 10000
$bulkCopy.WriteToServer($dt)
$bulkcopy.close()
$bulkcopy.Dispose()
}
catch
{
$ex = $_.Exception
Write-Error "Write-DataTable$($connectionName):$ex.Message"
continue
}
}
BTW: I use this to copy table with CLOB columns. I didn't get that to work using linked servers cf. question on dba. I didn't retry linked serves with the new ODAC.
顺便说一句:我用它来复制带有 CLOB 列的表。我没有使用链接服务器cf 让它工作。关于 dba 的问题。我没有重试与新 ODAC 的链接服务。