具有超过 200 万条记录的 Oracle Merge 语句的性能问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21669615/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 02:13:35  来源:igfitidea点击:

Performance Issue with Oracle Merge Statements with more than 2 Million records

performanceoraclemerge

提问by user2819874

I am executing the below MERGE statement for Insert Update operation.

我正在为插入更新操作执行以下 MERGE 语句。

It is working fine for 1 to 2 million records but for more than 4 to 5 billion records it takes 6 to 7 hours to complete.

它可以很好地处理 1 到 200 万条记录,但对于超过 4 到 50 亿条记录,它需要 6 到 7 个小时才能完成。

Can anyone suggest some alternative or performance tips for Merge Statement

任何人都可以为 Merge Statement 提出一些替代或性能提示吗

 merge into employee_payment ep
  using (
     select
        p.pay_id vista_payroll_id,
        p.pay_date pay_dte,
        c.client_id client_id,
        c.company_id company_id,
        case p.uni_ni when 0 then null else u.unit_id end unit_id,
        p.pad_seq pay_dist_seq_nbr,
        ph.payroll_header_id payroll_header_id,
        p.pad_id vista_paydist_id,
        p.pad_beg_payperiod pay_prd_beg_dt,
        p.pad_end_payperiod pay_prd_end_d
     from
        stg_paydist p
        inner join company c on c.vista_company_id = p.emp_ni
        inner join payroll_header ph on ph.vista_payroll_id = p.pay_id
        left outer join unit u on u.vista_unit_id = p.uni_ni
     where ph.deleted = '0'
  ) ps
     on (ps.vista_paydist_id = ep.vista_paydist_id)

  when matched then
  update
     set ep.vista_payroll_id      = ps.vista_payroll_id,
         ep.pay_dte               = ps.pay_dte,
         ep.client_id             = ps.client_id,
         ep.company_id            = ps.company_id,
         ep.unit_id               = ps.unit_id,
         ep.pay_dist_seq_nbr      = ps.pay_dist_seq_nbr,
         ep.payroll_header_id     = ps.payroll_header_id

  when not matched then
  insert (
     ep.employee_payment_id,
     ep.vista_payroll_id,
     ep.pay_dte,
     ep.client_id,
     ep.company_id,
     ep.unit_id,
     ep.pay_dist_seq_nbr,
     ep.payroll_header_id,
     ep.vista_paydist_id
  ) values (
     seq_employee_payments.nextval,
     ps.vista_payroll_id,
     ps.pay_dte,
     ps.client_id,
     ps.company_id,
     ps.unit_id,
     ps.pay_dist_seq_nbr,
     ps.payroll_header_id,
     ps.vista_paydist_id
  ) log errors into errorlog (v_batch || 'EMPLOYEE_PAYMENT') reject limit unlimited;

回答by Ronnis

Processing lots of data takes lots of time... Here are some things that may help you (assuming there is not a probolem with bad execution plan):

处理大量数据需要大量时间......以下是一些可能对您有所帮助的事情(假设没有执行计划不好的问题):

  1. Adding a where-clause in the UPDATE-part to only update records when the values are actually different. If you are merging the same data over and over again and only a smaller subset of the data is actually modified, this will improve performance.

  2. If you indeed are processing the same data over and over again, investigate whether you can add some modification flag/date to only process new records since last time.

  3. Depending on the kind of environment and when/who is updating your source tables, investigate whether a truncate-insert approach is beneficial. Remember to set the indexes unusuable on before hand.

  1. 在 UPDATE-part 中添加 where-clause 以仅在值实际不同时更新记录。如果您一遍又一遍地合并相同的数据,而实际上只修改了较小的数据子集,这将提高性能。

  2. 如果您确实一遍又一遍地处理相同的数据,请调查您是否可以添加一些修改标志/日期以仅处理自上次以来的新记录。

  3. 根据环境类型以及何时/谁更新您的源表,调查截断插入方法是否有益。请记住事先将索引设置为不可用。

I think your best bet here is to exploit the patterns in your data. This is something oracle does not know about, so you may have to get creative.

我认为最好的办法是利用数据中的模式。这是 oracle 不知道的事情,因此您可能需要发挥创造力。

回答by user6285587

Try using the Oracle hints:

尝试使用 Oracle 提示:

MERGE /*+ append leading(PS) use_nl(PS EP) parallel (12) */

Try to using hints to optimize inner using query.

尝试使用提示来优化内部使用查询。

回答by Osama Lone

I was working on a similar problem and a good solution i found was to break the query up. The primary reason big table merges are a bad idea is because of the in memory storage of the result of the using query. Because the PGA gets filled up pretty quickly so the database starts using the temporary table space of sort operations and joins. The temp tablespace being on disk is excruciatingly slow. The use of excessive temp table space can be easily avoided by splitting the query into two queries. So the below query

我正在研究一个类似的问题,我发现一个很好的解决方案是分解查询。大表合并是一个坏主意的主要原因是因为 using 查询的结果存储在内存中。因为 PGA 很快被填满,所以数据库开始使用排序操作和连接的临时表空间。磁盘上的临时表空间非常慢。通过将查询拆分为两个查询,可以轻松避免使用过多的临时表空间。所以下面的查询

merger into emp e
using ( 
select a,b,c,d from (/* big query here */)
) ec
on /*conditions*/
when matched then
/* rest of merge logic */

can become

可以变成

create table temp_big_query as select a,b,c,d from (/* big query here */);
merger into emp e
using ( 
select a,b,c,d from temp_big_query
) ec
on /*conditions*/
when matched then
/* rest of merge logic */

if the using query also has CTEs and sub queries try breaking that query up to use more temp tables like the one shown above. Also avoid using parallel hints because they mostly tend to slow the query down unless the query itself has something that can be done in parallel, try using indexes instead instead as much as possible parallel should be used as the last option for optimization. I know some references are missing please feel free to comment and add references or point out mistakes in my answer.

如果 using 查询也有 CTE 和子查询,请尝试分解该查询以使用更多临时表,如上所示。还要避免使用并行提示,因为它们通常会减慢查询速度,除非查询本身具有可以并行完成的内容,请尝试使用索引代替尽可能多的并行作为优化的最后选项。我知道缺少一些参考资料,请随时发表评论并添加参考资料或指出我的答案中的错误。