在 Oracle 中 10 分钟内插入 1000 万次查询?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24734131/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 02:25:44  来源:igfitidea点击:

INSERT of 10 million queries under 10 minutes in Oracle?

sqldatabaseoracleocibulk-load

提问by badola

I am working on a file loader program.

我正在开发一个文件加载程序。

The purpose of this program is to take an input file, do some conversions on its data and then upload the data into the database of Oracle.

这个程序的目的是取一个输入文件,对其数据做一些转换,然后将数据上传到Oracle的数据库中。

The problem that I am facing is that I need to optimize the insertion of very large input data on Oracle.

我面临的问题是我需要优化在 Oracle 上插入非常大的输入数据。

I am uploading data into the table, lets say ABC.

我正在将数据上传到表格中,比如说 ABC。

I am using the OCI library provided by Oracle in my C++ Program. In specific, I am using OCI Connection Pool for multi-threading and loading into ORACLE. (http://docs.oracle.com/cd/B28359_01/appdev.111/b28395/oci09adv.htm)

我在我的 C++ 程序中使用 Oracle 提供的 OCI 库。具体来说,我使用 OCI 连接池进行多线程处理并加载到 ORACLE 中。( http://docs.oracle.com/cd/B28359_01/appdev.111/b28395/oci09adv.htm)

The following are the DDL statements that have been used to create the table ABC –

以下是用于创建表 ABC 的 DDL 语句 -

CREATE TABLE ABC(
   seq_no         NUMBER NOT NULL,
   ssm_id         VARCHAR2(9)  NOT NULL,
   invocation_id  VARCHAR2(100)  NOT NULL,
   analytic_id    VARCHAR2(100) NOT NULL,
   analytic_value NUMBER NOT NULL,
   override       VARCHAR2(1)  DEFAULT  'N'   NOT NULL,
   update_source  VARCHAR2(255) NOT NULL,
   last_chg_user  CHAR(10)  DEFAULT  USER NOT NULL,
   last_chg_date  TIMESTAMP(3) DEFAULT  SYSTIMESTAMP NOT NULL
);

CREATE UNIQUE INDEX ABC_indx ON ABC(seq_no, ssm_id, invocation_id, analytic_id);
/
CREATE SEQUENCE ABC_seq;
/

CREATE OR REPLACE TRIGGER ABC_insert
BEFORE INSERT ON ABC
FOR EACH ROW
BEGIN
SELECT ABC_seq.nextval INTO :new.seq_no FROM DUAL;
END;

I am currently using the following Query pattern to upload the data into the database. I am sending data in batches of 500 queries via various threads of OCI connection pool.

我目前正在使用以下查询模式将数据上传到数据库中。我正在通过 OCI 连接池的各种线程分批发送 500 个查询的数据。

Sample of SQL insert query used -

使用的 SQL 插入查询示例 -

insert into ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value,
override, update_source)
select 'c','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'a','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'b','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'c','g',NULL, 'test', 123 , 'N', 'asdf' from dual

EXECUTION PLAN by Oracle for the above query -

Oracle 为上述查询执行的执行计划 -

-----------------------------------------------------------------------------
| Id  | Operation                | Name|Rows| Cost (%CPU) | Time     |
-----------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |     | 4  |     8   (0) | 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | ABC |    |             |          |
|   2 |   UNION-ALL              |     |    |             |          |
|   3 |    FAST DUAL             |     | 1  |     2   (0) | 00:00:01 |
|   4 |    FAST DUAL             |     | 1  |     2   (0) | 00:00:01 |
|   5 |    FAST DUAL             |     | 1  |     2   (0) | 00:00:01 |
|   6 |    FAST DUAL             |     | 1  |     2   (0) | 00:00:01 |

The Run times of the program loading 1 million lines -

加载 100 万行的程序运行时间 -

Batch Size = 500
Number of threads - Execution Time -
10                  4:19
20                  1:58
30                  1:17
40                  1:34
45                  2:06
50                  1:21
60                  1:24
70                  1:41
80                  1:43
90                  2:17
100                 2:06


Average Run Time = 1:57    (Roughly 2 minutes)

I need to optimize and reduce this time further. The problem that I am facing is when I put 10 million rows for uploading.

我需要进一步优化和减少这个时间。我面临的问题是当我上传 1000 万行时。

The average run time for 10 millioncame out to be = 21 minutes

1000 万的平均运行时间为= 21 分钟

(My target is to reduce this time to below 10 minutes)

(我的目标是将这个时间减少到 10 分钟以下)

So I tried the following steps as well -

所以我也尝试了以下步骤 -

[1] Did the partitioning of the table ABC on the basis of seq_no. Used 30 partitions. Tested with 1 million rows- The performance was very poor. almost 4 times more than the unpartitioned table.

[1] 在seq_no的基础上了表 ABC 的分区。使用了30 个分区。用100 万行测试- 性能很差。几乎是未分区表的 4 倍。

[2] Another partitioning of the table ABC on the basis of last_chg_date. Used 30 partitions.

[2] 表 ABC 的另一个分区基于last_chg_date。使用了30 个分区

2.a) Tested with 1 million rows - The performance was almost equal to the unpartitioned table.Very little difference was there so it was not considered.

2.a) 用 100 万行测试 -性能几乎与未分区表相同。差别很小,所以没有考虑。

2.b) Again tested the same with 10 million rows. The performance was almost equalto the unpartitioned table. No noticable difference.

2.b) 再次测试 1000 万行。性能几乎与未分区表相当。没有明显区别。

The following was the DDL commands were used to achieve partitioning -

以下是用于实现分区的 DDL 命令 -

CREATE TABLESPACE ts1 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts2 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts3 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts4 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts5 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts6 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts7 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts8 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts9 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts10 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts11 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts12 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts13 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts14 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts15 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts16 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts17 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts18 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts19 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts20 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts21 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts22 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts23 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts24 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts25 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts26 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts27 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts28 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts29 DATAFILE AUTOEXTEND ON;
CREATE TABLESPACE ts30 DATAFILE AUTOEXTEND ON;

CREATE TABLE ABC(
   seq_no           NUMBER NOT NULL,
   ssm_id           VARCHAR2(9)  NOT NULL,
   invocation_id    VARCHAR2(100)  NOT NULL,
   calc_id          VARCHAR2(100) NULL,
   analytic_id      VARCHAR2(100) NOT NULL,
   ANALYTIC_VALUE   NUMBER NOT NULL,
   override         VARCHAR2(1)  DEFAULT  'N'   NOT NULL,
   update_source    VARCHAR2(255) NOT NULL,
   last_chg_user    CHAR(10)  DEFAULT  USER NOT NULL,
   last_chg_date    TIMESTAMP(3) DEFAULT  SYSTIMESTAMP NOT NULL
)
PARTITION BY HASH(last_chg_date)
PARTITIONS 30
STORE IN (ts1, ts2, ts3, ts4, ts5, ts6, ts7, ts8, ts9, ts10, ts11, ts12, ts13,
ts14, ts15, ts16, ts17, ts18, ts19, ts20, ts21, ts22, ts23, ts24, ts25, ts26,
ts27, ts28, ts29, ts30);

CODE that I am using in the thread function (written in C++), using OCI -

我在线程函数中使用的代码(用 C++ 编写),使用 OCI -

void OracleLoader::bulkInsertThread(std::vector<std::string> const & statements)
{

    try
    {
        INFO("ORACLE_LOADER_THREAD","Entered Thread = %1%", m_env);
        string useOraUsr = "some_user";
        string useOraPwd = "some_password";

        int user_name_len   = useOraUsr.length();
        int passwd_name_len = useOraPwd.length();

        text* username((text*)useOraUsr.c_str());
        text* password((text*)useOraPwd.c_str());


        if(! m_env)
        {
            CreateOraEnvAndConnect();
        }
        OCISvcCtx *m_svc = (OCISvcCtx *) 0;
        OCIStmt *m_stm = (OCIStmt *)0;

        checkerr(m_err,OCILogon2(m_env,
                                 m_err,
                                 &m_svc,
                                 (CONST OraText *)username,
                                 user_name_len,
                                 (CONST OraText *)password,
                                 passwd_name_len,
                                 (CONST OraText *)poolName,
                                 poolNameLen,
                                 OCI_CPOOL));

        OCIHandleAlloc(m_env, (dvoid **)&m_stm, OCI_HTYPE_STMT, (size_t)0, (dvoid **)0);

////////// Execution Queries in the format of - /////////////////
//        insert into pm_own.sec_analytics (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value, override, update_source)
//        select 'c','b',NULL, 'test', 123 , 'N', 'asdf' from dual
//        union all select 'a','b',NULL, 'test', 123 , 'N', 'asdf' from dual
//        union all select 'b','b',NULL, 'test', 123 , 'N', 'asdf' from dual
//        union all select 'c','g',NULL, 'test', 123 , 'N', 'asdf' from dual
//////////////////////////////////////////////////////////////////

        size_t startOffset = 0;
        const int batch_size = PCSecAnalyticsContext::instance().getBatchCount();
        while (startOffset < statements.size())
        {
            int remaining = (startOffset + batch_size < statements.size() ) ? batch_size : (statements.size() - startOffset );
            // Break the query vector to meet the batch size
            std::vector<std::string> items(statements.begin() + startOffset,
                                           statements.begin() + startOffset + remaining);

            //! Preparing the Query
            std::string insert_query = "insert into ";
            insert_query += Context::instance().getUpdateTable();
            insert_query += " (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value, override, update_source)\n";

            std::vector<std::string>::const_iterator i3 = items.begin();
            insert_query += *i3 ;

            for( i3 = items.begin() + 1; i3 != items.end(); ++i3)
                insert_query += "union " + *i3 ;
            // Preparing the Statement and Then Executing it in the next step
            text *txtQuery((text *)(insert_query).c_str());
            checkerr(m_err, OCIStmtPrepare (m_stm, m_err, txtQuery, strlen((char *)txtQuery), OCI_NTV_SYNTAX, OCI_DEFAULT));
            checkerr(m_err, OCIStmtExecute (m_svc, m_stm, m_err, (ub4)1, (ub4)0, (OCISnapshot *)0, (OCISnapshot *)0, OCI_DEFAULT ));

            startOffset += batch_size;
        }

        // Here is the commit statement. I am committing at the end of each thread.
        checkerr(m_err, OCITransCommit(m_svc,m_err,(ub4)0));

        checkerr(m_err, OCIHandleFree((dvoid *) m_stm, OCI_HTYPE_STMT));
        checkerr(m_err, OCILogoff(m_svc, m_err));

        INFO("ORACLE_LOADER_THREAD","Thread Complete. Leaving Thread.");
    }

    catch(AnException &ex)
    {
        ERROR("ORACLE_LOADER_THREAD", "Oracle query failed with : %1%", std::string(ex.what()));
        throw AnException(string("Oracle query failed with : ") + ex.what());
    }
}

While the post was being answered, I was suggested several methods to optimize my INSERT QUERY. I have chosen and used QUERY Iin my program for the following reasons that I discovered while testing the various INSERT Queries. On running the SQL Queries that were suggested to me - QUERY I -

在回答帖子时,有人建议我使用几种方法来优化我的INSERT QUERY。我在我的程序中选择并使用QUERY I的原因是我在测试各种 INSERT 查询时发现的以下原因。在运行向我建议的 SQL 查询时 - QUERY I -

insert into ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value,
override, update_source)
select 'c','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'a','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'b','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'c','g',NULL, 'test', 123 , 'N', 'asdf' from dual

EXECUTION PLAN by Oracle for Query I -

Oracle 查询 I 的执行计划 -

--------------------------------------------------------------------------
| Id  | Operation                | Name| Rows | Cost (%CPU)   | Time     |
--------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |     |  4   | 8   (0)       | 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | ABC |      |               |          |
|   2 |   UNION-ALL              |     |      |               |          |
|   3 |    FAST DUAL             |     |  1   | 2   (0)       | 00:00:01 |
|   4 |    FAST DUAL             |     |  1   | 2   (0)       | 00:00:01 |
|   5 |    FAST DUAL             |     |  1   | 2   (0)       | 00:00:01 |
|   6 |    FAST DUAL             |     |  1   | 2   (0)       | 00:00:01 |

QUERY II -

查询二 -

insert all
into ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value,
override, update_source) values ('c','b',NULL, 'test', 123 , 'N', 'asdf')
into ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value,
override, update_source) values ('c','e',NULL, 'test', 123 , 'N', 'asdf')
into ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value,
override, update_source) values ('c','r',NULL, 'test', 123 , 'N', 'asdf')
into ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value,
override, update_source) values ('c','t',NULL, 'test', 123 , 'N', 'asdf')
select 1 from dual

EXECUTION PLAN by Oracle for Query II -

Oracle 查询 II 的执行计划 -

-----------------------------------------------------------------------------
| Id  | Operation           | Name| Rows  | Cost (%CPU)   | Time     |
-----------------------------------------------------------------------------
|   0 | INSERT STATEMENT    |     | 1     |     2   (0)   | 00:00:01 |
|   1 |  MULTI-TABLE INSERT |     |       |               |          |
|   2 |   FAST DUAL         |     | 1     |     2   (0)   | 00:00:01 |
|   3 |   INTO              | ABC |       |               |          |
|   4 |   INTO              | ABC |       |               |          |
|   5 |   INTO              | ABC |       |               |          |
|   6 |   INTO              | ABC |       |               |          |

As per the experiments the Query I is faster.

根据实验,查询 I 更快

Here I tested on both Oracle SQL Developer , as well as I sent insert queries by my C++ program (FILELOADER) as well.

在这里,我在 Oracle SQL Developer 上进行了测试,并且还通过我的 C++ 程序 (FILELOADER) 发送了插入查询。

On Further reading about it, I found out that the cost shown by the Execution Plan is the number of CPU the query will use to process itself. That tells that Oracle will use more CPU to process the first query and that is why its cost goes on to be = 8.

在进一步阅读它时,我发现执行计划显示的成本是查询将用于处理自身的 CPU 数量。 这表明 Oracle 将使用更多 CPU 来处理第一个查询,这就是为什么它的成本继续为 = 8。

Even by using the same insert pattern via my application, I found out that its performance it almost 1.5 times better.

即使通过我的应用程序使用相同的插入模式,我发现它的性能几乎提高了 1.5 倍。

I need some insight on how I can improve the performance even further..? All the things that I have tried, I have summarized them in my question. If I find or discover anything relevant, I will add to this question.

我需要一些关于如何进一步提高性能的见解..?我尝试过的所有事情,我都总结在我的问题中。如果我发现或发现任何相关内容,我会添加到这个问题中。

My target in to bring the upload time of 10 million queries under 10 minutes.

我的目标是将1000 万次查询上传时间控制在 10 分钟以内

采纳答案by Ben

I know others have mentioned this and you don't want to hear it but use SQL*Loaderor external tables. My average load time for tables of approximately the same width is 12.57 secondsfor just over 10m rows. These utilities have been explicitly designed to load data into the database quickly and are pretty good at it. This may incur some additional time penalties depending on the format of your input file, but there are quite a few options and I've rarely had to change files prior to loading.

我知道其他人已经提到了这一点,而您不想听到它,而是使用SQL*Loader外部表。我对大约相同宽度的表格的平均加载时间为 12.57,仅超过 10m 行。这些实用程序被明确设计为快速将数据加载到数据库中,并且非常擅长。这可能会导致一些额外的时间损失,具体取决于您的输入文件的格式,但有很多选项,而且我很少在加载之前更改文件。

If you're unwilling to do this then you don't have to upgrade your hardware yet; you need to remove every possible impediment to loading this quickly. To enumerate them, remove:

如果您不愿意这样做,那么您还不必升级硬件;您需要消除所有可能的障碍以快速加载它。要枚举它们,请删除:

  1. The index
  2. The trigger
  3. The sequence
  4. The partition
  1. 指数
  2. 触发
  3. 序列
  4. 分区

With all of these you're obliging the database to perform more work and because you're doing this transactionally, you're not using the database to its full potential.

有了所有这些,您就不得不让数据库执行更多工作,并且因为您是以事务方式执行此操作,所以您并没有充分利用数据库的潜力。

Load the data into a separate table, say ABC_LOAD. After the data has been completely loaded perform a singleINSERT statement into ABC.

将数据加载到单独的表中,例如ABC_LOAD。数据完全加载后,在ABC 中执行单个INSERT 语句。

insert into abc
select abc_seq.nextval, a.*
  from abc_load a

When you do this (and even if you don't) ensure that the sequence cache size is correct; to quote:

当您这样做时(即使您不这样做),请确保序列缓存大小正确;引用

When an application accesses a sequence in the sequence cache, the sequence numbers are read quickly. However, if an application accesses a sequence that is not in the cache, then the sequence must be read from disk to the cache before the sequence numbers are used.

If your applications use many sequences concurrently, then your sequence cache might not be large enough to hold all the sequences. In this case, access to sequence numbers might often require disk reads. For fast access to all sequences, be sure your cache has enough entries to hold all the sequences used concurrently by your applications.

当应用程序访问序列缓存中的序列时,序列号会被快速读取。但是,如果应用程序访问不在缓存中的序列,则必须在使用序列号之前将该序列从磁盘读取到缓存中。

如果您的应用程序同时使用许多序列,那么您的序列缓存可能不足以容纳所有序列。在这种情况下,访问序列号可能经常需要读取磁盘。为了快速访问所有序列,请确保您的缓存有足够的条目来保存应用程序同时使用的所有序列。

This means that if you have 10 threads concurrently writing 500 records each using this sequence then you need a cache size of 5,000. The ALTER SEQUENCEdocument states how to change this:

这意味着如果您有 10 个线程同时使用此序列写入 500 条记录,那么您需要的缓存大小为 5,000。在ALTER SEQUENCE文件指出如何改变这样的:

alter sequence abc_seq cache 5000

If you follow my suggestion I'd up the cache size to something around 10.5m.

如果您按照我的建议,我会将缓存大小增加到 10.5m 左右。

Look into using the APPEND hint(see also Oracle Base); this instructs Oracle to use a direct-path insert, which appends data directly to the end of the table rather than looking for space to put it. You won't be able to use this if your table has indexes but you could use it in ABC_LOAD

研究使用APPEND 提示(另见 Oracle Base);这指示 Oracle 使用直接路径插入,它将数据直接附加到表的末尾,而不是寻找空间来放置它。如果您的表有索引,您将无法使用它,但您可以在ABC_LOAD

insert /*+ append */ into ABC (SSM_ID, invocation_id , calc_id, ... )
select 'c','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'a','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'b','b',NULL, 'test', 123 , 'N', 'asdf' from dual
union all select 'c','g',NULL, 'test', 123 , 'N', 'asdf' from dual

If you use the APPEND hint; I'd add TRUNCATEABC_LOADafter you've inserted into ABCotherwise this table will grow indefinitely. This should be safe as you will have finished using the table by then.

如果您使用 APPEND 提示;我会在您插入后添加TRUNCATEABC_LOADABC否则该表将无限增长。这应该是安全的,因为到那时您将完成使用该表。

You don't mention what version or edition or Oracle you're using. There are a number of extra little tricks you can use:

您没有提及您使用的版本或版本或 Oracle。您可以使用许多额外的小技巧:

  • Oracle 12c

    This version supports identity columns; you could get rid of the sequence completely.

    CREATE TABLE ABC(
       seq_no         NUMBER GENERATED AS IDENTITY (increment by 5000)
    
  • Oracle 11g r2

    If you keep the trigger; you can assign the sequence value directly.

    :new.seq_no := ABC_seq.nextval;
    
  • Oracle Enterprise Edition

    If you're using Oracle Enterprise you can speed up the INSERT from ABC_LOADby using the PARALLEL hint:

    insert /*+ parallel */ into abc
    select abc_seq.nextval, a.*
      from abc_load a
    

    This can cause it's own problems (too many parallel processes etc), so test. It mighthelp for the smaller batch inserts but it's less likely as you'll lose time computing what thread should process what.

  • 甲骨文 12c

    此版本支持标识列;你可以完全摆脱这个序列。

    CREATE TABLE ABC(
       seq_no         NUMBER GENERATED AS IDENTITY (increment by 5000)
    
  • 甲骨文 11g r2

    如果你保持扳机;您可以直接分配序列值。

    :new.seq_no := ABC_seq.nextval;
    
  • 甲骨文企业版

    如果您使用的是 Oracle Enterprise,则可以ABC_LOAD使用PARALLEL 提示加快 INSERT 的速度:

    insert /*+ parallel */ into abc
    select abc_seq.nextval, a.*
      from abc_load a
    

    这可能会导致其自身的问题(并行进程过多等),因此请进行测试。它可能有助于较小的批量插入,但不太可能,因为您会浪费时间计算哪个线程应该处理什么。



tl;dr

tl;博士

Use the utilities that come with the database.

使用数据库附带的实用程序。

If you can't use them then get rid of everything that might slow the insert down and do it in bulk, 'cause that's what the database is good at.

如果你不能使用它们,那么摆脱所有可能会减慢插入速度的东西并批量进行,因为这就是数据库擅长的。

回答by borjab

If you have a text file you should try SQL LOADERwith direct path. It is really fast and it is designed for this kind of massive data loads. Have a look at this optionsthat can improve the performance.

如果您有一个文本文件,您应该尝试使用直接路径的SQL LOADER。它非常快,专为这种海量数据加载而设计。看看这个可以提高性能的选项

As a secondary advantage for ETL, your file in clear text will be smaller and easier to audit than 10^7 inserts.

作为 ETL 的第二个优势,与 10^7 插入相比,您的明文文件将更小且更易于审核。

If you need to make some transformation you can do it afterwards with oracle.

如果您需要进行一些转换,您可以在之后使用 oracle 进行。

回答by xacinay

You should try bulk insert your data. For that purpose, you can use OCI*ML. The discussion of it is here. Noteable article is here. Or you may try Oracle SQL Bulk Loader SQLLDRitself to increase your upload speed. To do that, serialize the data into csv file and call SQLLDR passing csv as an argument.

您应该尝试批量插入您的数据。为此,您可以使用OCI*ML。对它的讨论是here。值得注意的文章在这里。或者您可以尝试使用 Oracle SQL Bulk LoaderSQLLDR本身来提高上传速度。为此,将数据序列化为 csv 文件并调用 SQLLDR,将 csv 作为参数传递。

Another possible optimization is transaction strategy. Try insert all data in 1 transaction per thread/connection.

另一种可能的优化是事务策略。尝试在每个线程/连接的 1 个事务中插入所有数据。

Another approach is to use MULTIPLE INSERT:

另一种方法是使用MULTIPLE INSERT

INSERT ALL
   INTO ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value, 
   override, update_source ) VALUES ('c','b',NULL, 'test', 123 , 'N', 'asdf')
   INTO ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value, 
   override, update_source ) VALUES ('a','b',NULL, 'test', 123 , 'N', 'asdf')
   INTO ABC (SSM_ID, invocation_id , calc_id, analytic_id, analytic_value, 
   override, update_source ) VALUES ('b','b',NULL, 'test', 123 , 'N', 'asdf')

SELECT 1 FROM DUAL;

instead insert .. union all.

相反insert .. union all

Your sample data looks interindependent, that leads to inserting 1 significant row, then extending it into 4 rows with post-insert sql query.

您的示例数据看起来是相互独立的,这会导致插入 1 个重要行,然后使用插入后的 sql 查询将其扩展为 4 行。

Also, turn off all indexes before insert batch (or delete them and re-create on bulk done). Table Index reduces insert perfomance while you dont actually use it at that time (it calculates some id over every inserted row and performs corresponding operations).

此外,在插入批处理之前关闭所有索引(或删除它们并在批量完成时重新创建)。表索引会降低插入性能,而您当时并没有实际使用它(它会在每个插入的行上计算一些 id 并执行相应的操作)。

Using prepared statement syntax should speed up upload routine due server would have an already parsed cached statement.

使用准备好的语句语法应该加快上传例程,因为服务器将有一个已经解析的缓存语句。

Then, optimize your C++ code: move ops out of cycle:

然后,优化您的 C++ 代码:将操作移出循环:

 //! Preparing the Query
   std::string insert_query = "insert into ";
   insert_query += Context::instance().getUpdateTable();
   insert_query += " (SSM_ID, invocation_id , calc_id, 
        analytic_id, analytic_value, override, update_source)\n";
   while (startOffset < statements.size())
   { ... }

回答by Maksim Gumerov

By the way, did you try to increase number of physical clients, not just threads? By running in a cloud on several VMs or on several physical machines. I recently read comments I think from Aerospike developers where they explain that many people are unable to reproduce their results just because they don't understand it's not that easy to make a client actually send that much queries per second (above 1M per second in their case). For instance, for their benchmark they had to run 4 clients in parallel. Maybe this particular oracle driver just is not fast enough to support more than 7-8 thousands of request per second on single machine?

顺便说一句,您是否尝试增加物理客户端的数量,而不仅仅是线程?通过在多个虚拟机或多个物理机器上的云中运行。我最近读了我认为来自 Aerospike 开发人员的评论,他们解释说许多人无法重现他们的结果只是因为他们不明白让客户端每秒实际发送这么多查询并不容易(在他们的案件)。例如,对于他们的基准测试,他们必须并行运行 4 个客户端。也许这个特定的 oracle 驱动程序不够快,无法在单机上每秒支持超过 7-8 千个请求?