如何使用 SELECT DISTINCT 提高 Oracle 中的性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1051593/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 01:58:32  来源:igfitidea点击:

How to improve performance in Oracle using SELECT DISTINCT

performanceoracleofbiz

提问by Ismael

I'm currently working in the deployment of an OFBiz based ERP The database being used is Oracle 10g Enterprise

我目前正在部署基于 OFBiz 的 ERP 正在使用的数据库是 Oracle 10g Enterprise

One of the biggest issues is some oracle performance problems, analyzing the ofbiz logs, the following query:

最大的问题之一是一些oracle性能问题,分析ofbiz日志,查询如下:

SELECT DISTINCT ORDER_ID, ORDER_TYPE_ID, ORDER_NAME, EXTERNAL_ID,
 SALES_CHANNEL_ENUM_ID, ORDER_DATE, ENTRY_DATE, VISIT_ID, STATUS_ID, CREATED_BY, 
 FIRST_ATTEMPT_ORDER_ID, CURRENCY_UOM, SYNC_STATUS_ID, BILLING_ACCOUNT_ID, 
 ORIGIN_FACILITY_ID, WEB_SITE_ID, PRODUCT_STORE_ID, TERMINAL_ID, TRANSACTION_ID, 
 AUTO_ORDER_SHOPPING_LIST_ID, NEEDS_INVENTORY_ISSUANCE, IS_RUSH_ORDER, INTERNAL_CODE, 
 REMAINING_SUB_TOTAL, GRAND_TOTAL, LAST_UPDATED_STAMP, LAST_UPDATED_TX_STAMP, CREATED_STAMP, 
CREATED_TX_STAMP, RECIBIR_BODEGAL, RECEPCIONADA_BODEGAL, FECHA_RECEPCION_BODEGAL FROM 
ERP.ORDER_HEADER WHERE ((STATUS_ID = :v0 OR STATUS_ID = :v1 OR STATUS_ID = :v2) AND 
(ORDER_TYPE_ID = :v3)) ORDER BY ORDER_DATE DESC

is very slow. We've tested executing the query without the DISTINCT and it takes about 30 seconds. There are 4.000.000+ registers in the table. There are index for the PK field orderId and almost every other field

很慢。我们已经测试在没有 DISTINCT 的情况下执行查询,大约需要 30 秒。表中有 4.000.000+ 个寄存器。PK 字段 orderId 和几乎所有其他字段都有索引

The EXPLAIN PLAN with DISTINCT is:

带有 DISTINCT 的解释计划是:

SELECT STATEMENT () (null)
 SORT (ORDER BY)    (null)
  HASH (UNIQUE) (null)
   TABLE ACCESS (FULL)  ORDER_HEADER

and without the DISTINCT is:

没有 DISTINCT 的是:

SELECT STATEMENT () (null)
 SORT (ORDER BY)    (null)
  TABLE ACCESS (FULL)   ORDER_HEADER

any ideas about tuning oracle to improve the performance of this kind of queries? It's very difficult to rewrite the query because is automatically generated by ofbiz so I think the solution is about tuning oracle

关于调整 oracle 以提高此类查询的性能的任何想法?重写查询非常困难,因为它是由 ofbiz 自动生成的,所以我认为解决方案是关于调整 oracle

thanks in advance

提前致谢

EDIT: I analyzed the query using tkprof ,as suggested by Rob van Wijk and haffax,and the result is the following

编辑:我使用 tkprof 分析了查询,正如 Rob van Wijk 和 haffax 所建议的,结果如下

********************************************************************************

SELECT DISTINCT ORDER_ID, ORDER_TYPE_ID, ORDER_NAME, EXTERNAL_ID,
 SALES_CHANNEL_ENUM_ID, ORDER_DATE, ENTRY_DATE, VISIT_ID, STATUS_ID, CREATED_BY, 
 FIRST_ATTEMPT_ORDER_ID, CURRENCY_UOM, SYNC_STATUS_ID, BILLING_ACCOUNT_ID, 
 ORIGIN_FACILITY_ID, WEB_SITE_ID, PRODUCT_STORE_ID, TERMINAL_ID, TRANSACTION_ID, 
 AUTO_ORDER_SHOPPING_LIST_ID, NEEDS_INVENTORY_ISSUANCE, IS_RUSH_ORDER, INTERNAL_CODE, 
 REMAINING_SUB_TOTAL, GRAND_TOTAL, LAST_UPDATED_STAMP, LAST_UPDATED_TX_STAMP, CREATED_STAMP, 
CREATED_TX_STAMP, RECIBIR_BODEGAL, RECEPCIONADA_BODEGAL, FECHA_RECEPCION_BODEGAL FROM 
ERP.ORDER_HEADER WHERE STATUS_ID = 'ORDER_COMPLETED' ORDER BY ORDER_DATE DESC

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      9.10     160.81      66729      65203         37          50
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      9.14     160.83      66729      65203         37          50

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 58  

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       1        0.00          0.00
  db file scattered read                       8178        0.28        146.55
  direct path write temp                       2200        0.04          4.22
  direct path read temp                          36        0.14          2.01
  SQL*Net more data to client                     3        0.00          0.00
  SQL*Net message from client                     1        3.36          3.36
********************************************************************************

So it seems the problem is the 'db file scattered read', any ideas to how to tune oracle in order to reduce the wait in this event?

所以看起来问题是'db文件分散读取',关于如何调整oracle以减少这种事件中的等待的任何想法?

Follow up with the new tkprof result, this time closing the session:

跟进新的 tkprof 结果,这次关闭会话:

********************************************************************************

SELECT DISTINCT ORDER_ID, ORDER_TYPE_ID, ORDER_NAME, EXTERNAL_ID,
 SALES_CHANNEL_ENUM_ID, ORDER_DATE, ENTRY_DATE, VISIT_ID, STATUS_ID, CREATED_BY,
 FIRST_ATTEMPT_ORDER_ID, CURRENCY_UOM, SYNC_STATUS_ID, BILLING_ACCOUNT_ID,
 ORIGIN_FACILITY_ID, WEB_SITE_ID, PRODUCT_STORE_ID, TERMINAL_ID, TRANSACTION_ID,
 AUTO_ORDER_SHOPPING_LIST_ID, NEEDS_INVENTORY_ISSUANCE, IS_RUSH_ORDER, INTERNAL_CODE,
 REMAINING_SUB_TOTAL, GRAND_TOTAL, LAST_UPDATED_STAMP, LAST_UPDATED_TX_STAMP, CREATED_STAMP,
CREATED_TX_STAMP, RECIBIR_BODEGAL, RECEPCIONADA_BODEGAL, FECHA_RECEPCION_BODEGAL FROM
ERP.ORDER_HEADER WHERE STATUS_ID = 'ORDER_COMPLETED' ORDER BY ORDER_DATE DESC

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.03       0.01          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        1      8.23      47.66      66576      65203         31          50
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      8.26      47.68      66576      65203         31          50

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 58 

Rows     Row Source Operation
-------  ---------------------------------------------------
     50  SORT ORDER BY (cr=65203 pr=66576 pw=75025 time=47666679 us)
3456659   TABLE ACCESS FULL ORDER_HEADER (cr=65203 pr=65188 pw=0 time=20757300 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       1        0.00          0.00
  db file scattered read                       8179        0.14         34.96
  direct path write temp                       2230        0.00          3.91
  direct path read temp                          52        0.14          0.84
  SQL*Net more data to client                     3        0.00          0.00
  SQL*Net message from client                     1     1510.62       1510.62
********************************************************************************

采纳答案by Rob van Wijk

If the difference between the two queries is substantial, that would be surprising. You mention that the query without DISTINCT takes about 30 seconds. How much time does the query with the DISTINCT take?

如果两个查询之间的差异很大,那将是令人惊讶的。您提到没有 DISTINCT 的查询大约需要 30 秒。带有 DISTINCT 的查询需要多长时间?

Can you show the tkprof output of the query with the DISTINCT, after you traced the session with a "alter session set events '10046 trace name context forever, level 8'", and disconnect after the query has finished? This way we can see where time is actually being spent and if it was waiting for something ("direct path read temp" maybe?)

在您使用“alter session set events '10046 trace name context ever, level 8'”跟踪会话并在查询完成后断开连接后,您能否使用 DISTINCT 显示查询的 tkprof 输出?通过这种方式,我们可以看到时间实际上花在了哪里,以及它是否在等待某些东西(“直接路径读取温度”可能?)

Regards, Rob.

问候,罗布。



Followup, after the tkprof file was posted:

后续,在 tkprof 文件发布后:

I see you managed to get the tkprof output, but unfortunately you didn't disconnect your session before creating the tkprof file. Now, the cursor was left open and it failed to write STAT# lines to your trace file. This is why you don't have a plan / row source operation in your tkprof file. It would be nice if you can repeat the process, if the suggestion below turns out to be rubbish.

我看到您设法获得了 tkprof 输出,但不幸的是,您在创建 tkprof 文件之前没有断开会话。现在,光标保持打开状态,并且无法将 STAT# 行写入您的跟踪文件。这就是为什么您的 tkprof 文件中没有计划/行源操作的原因。如果下面的建议被证明是垃圾的话,如果你能重复这个过程就好了。

A little speculation from my side: I think the DISTINCT is almost a no-op because you are selecting so many columns. If this is true, then your predicate "WHERE STATUS_ID = 'ORDER_COMPLETED'" is very selective and you will benefit from having an index on this column. After you create the index, make sure you analyze it properly, maybe even with a histogram on if data values are skewed. The end result will be a different plan for this query, starting with an INDEX RANGE SCAN, followed by a TABLE ACCESS BY ROWID, leading to a very fast query.

我这边的一点推测:我认为 DISTINCT 几乎是一个空操作,因为您选择了这么多列。如果这是真的,那么您的谓词“WHERE STATUS_ID = 'ORDER_COMPLETED'”是非常有选择性的,您将受益于在此列上建立索引。创建索引后,请确保对其进行了正确分析,甚至可以使用直方图来显示数据值是否有偏差。最终结果将是此查询的不同计划,从 INDEX RANGE SCAN 开始,然后是 TABLE ACCESS BY ROWID,从而实现非常快速的查询。

After you have created an index, you can have the table re-analyzed, including histograms using this statement:

创建索引后,您可以使用以下语句重新分析表,包括直方图:

exec dbms_stats.gather_table_stats([owner],[table_name],cascade=>true,method_opt=>'FOR ALL INDEXED COLUMNS SIZE ')

exec dbms_stats.gather_table_stats([owner],[table_name],cascade=>true,method_opt=>'FOR ALL INDEXED COLUMNS SIZE')

Regards, Rob.

问候,罗布。

回答by haffax

Since you're ordering results according to order_dateit is important that you have a descending index on that field. Also tell oracle that you wish to use this index. Place the order_datefield first in the query and use a hint like

由于您根据结果对结果进行排序,order_date因此在该字段上有一个降序索引很重要。还要告诉 oracle 您希望使用此索引。将order_date字段放在查询的第一位并使用类似的提示

SELECT /*+ index(HEADERS IDX_ORDER_DATE_DESC) */ ... 
FROM ERP.ORDER_HEADER HEADERS
WHERE ...
ORDER BY ORDER_DATE DESC

It is not so much about having indices, but rather about telling oracle to use them. Oracle is very picky about indices. You get the best results when you choose indices according to your most important queries. If in doubt, trace a query. This way you can see in what part of the query oracle spends the most time and whether your indices are actually picked up or not. Tracing is invaluable when fighting performance problems.

与其说拥有索引,不如说是告诉 oracle 使用它们。Oracle 对索引非常挑剔。当您根据最重要的查询选择索引时,您将获得最佳结果。如果有疑问,请跟踪查询。通过这种方式,您可以看到查询 oracle 的哪个部分花费的时间最多,以及您的索引是否被实际选中。在解决性能问题时,跟踪是无价的。

回答by Tony Andrews

Is ORDER_ID declared as the PK using a PRIMARY KEY constraint? Because if it is I would expect the optimiser to recognise that the DISTINCT is superfluous in this query and optimise it out. Without the constraint, it won't know it is superfluous and so will expend unnecessary and considerable effort in "de-duping" the results.

ORDER_ID 是否使用 PRIMARY KEY 约束声明为 PK?因为如果是的话,我希望优化器能够识别出 DISTINCT 在此查询中是多余的,并将其优化掉。没有约束,它不会知道它是多余的,因此会花费不必要的和相当大的努力来“去除重复”结果。

回答by Sergey

Try to disable hash aggregation:

尝试禁用哈希聚合:

select /*+ no_use_hash_aggregation*/ distinct ...

http://oracle-randolf.blogspot.com/2011/01/hash-aggregation.html

http://oracle-randolf.blogspot.com/2011/01/hash-aggregation.html

回答by dpbradley

When troubleshooting applications where I don't have control of the SQL I find that the dbms_sqltune package saves a lot of time. See http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/d_sqltun.htm, and yes, unfortunately you should be licensed to use it.

在对我无法控制 SQL 的应用程序进行故障排除时,我发现 dbms_sqltune 包可以节省大量时间。请参阅http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/d_sqltun.htm,是的,不幸的是,您应该获得使用它的许可。

There are procedures in this package to run a tuning analysis against a specific sql_id in the shared pool or the AWR repository. The analysis will contain indexing recommendations if there are any improvements to be made with additional indexes. More importantly, the analyzer might discover an improved access path that it can implement with what Oracle calls a SQL Profile - this is a set of hints that will be stored and used whenever this sql_id is executed. This happens without requiring the hints to be coded in the SQL statement, and there's also an option to do what you can think of as fuzzy matching if your application generates literal values in the statement instead of bind variables.

此包中有一些过程可以针对共享池或 AWR 存储库中的特定 sql_id 运行调优分析。如果要对其他索引进行任何改进,分析将包含索引建议。更重要的是,分析器可能会发现一个改进的访问路径,它可以使用 Oracle 称为 SQL 配置文件的内容来实现——这是一组提示,每当执行此 sql_id 时将存储和使用。这不需要在 SQL 语句中编码提示,如果您的应用程序在语句中生成文字值而不是绑定变量,那么还有一个选项可以执行您可以认为是模糊匹配的操作。

Of course this tool shouldn't be a substitute for understanding the application and it's data structures, but reading through the output that details the execution path of a better plan (if found) can be educational.

当然,此工具不应替代理解应用程序及其数据结构,但通读详细说明更好计划(如果找到)的执行路径的输出可以具有教育意义。

回答by Aurelio Martin Massoni

Oracle is accessing the whole table each time you run the query ( TABLE ACCESS (FULL) ). Creating an INDEX on the STATUS_ID and ORDER_TYPE_ID columns

每次运行查询( TABLE ACCESS (FULL) )时,Oracle 都会访问整个表。在 STATUS_ID 和 ORDER_TYPE_ID 列上创建 INDEX

CREATE INDEX ERP.ORDER_HEADER_I1 ON ERP.ORDER_HEADER ( STATUS_ID, ORDER_TYPE_ID );

will help a lot, especially if there are several different values of STATUS_ID and ORDER_TYPE_ID in the ORDER_HEADER table.

会很有帮助,特别是如果 ORDER_HEADER 表中有几个不同的 STATUS_ID 和 ORDER_TYPE_ID 值。