oracle 收集分区表的统计信息

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41415422/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 03:21:45  来源:igfitidea点击:

Gather stats on partition table

oracleoracle11g

提问by Prats

I have table ABCinterval partition by each day. Each partition is used in queries for that specific day. Even if i schedule job @nyt each day to gather stats then queries which are using that table before stats gather will not use optimal plan.

我每天都有表ABC间隔分区。每个分区用于特定日期的查询。即使我每天安排工作@nyt 来收集统计信息,那么在统计信息收集之前使用该表的查询也不会使用最佳计划。

回答by Marmite Bomber

Gathering optimizer statistics on a partitioned table is not a trivial task with some caveats. Particularly on a daily partitioned schema it could be not the best solution to gather partitions statistics once per day.

收集分区表的优化器统计信息并不是一项有一些警告的微不足道的任务。特别是在每日分区模式上,每天收集一次分区统计信息可能不是最佳解决方案。

To demonstrate it, let assume we have not daily schema but yearly partitions of transaction data. The question is, would it be OK to gather statistics say on 1. January (or 1. June or 31. December)? The answer is definitively NO, as in the first case the partition would be considered (nearly) empty, in the latter cases the statistic would be realistic, but they were gathered too late.

为了演示它,假设我们没有每日模式,而是交易数据的年度分区。问题是,是否可以在 1 月 1 日(或 1 月 1 日或 12 月 31 日)收集统计数据?答案肯定NO,因为在第一种情况下,分区将被认为(几乎)为空,在后一种情况下,统计数据是现实的,但收集得太晚了。

With this in mind there are IMO three possible approaches to handle it

考虑到这一点,IMO 有三种可能的方法来处理它

1) do not gather statistics at all (and use dynamic sampling)

1)根本不收集统计数据(并使用动态采样)

2) gather partition statistics repeatedly (say each hour)

2)重复收集分区统计信息(比如每小时)

3) do not gather statistics, but set them so, that the queries perform fine

3)不收集统计信息,而是设置它们,以便查询执行良好

The best option depends on your data and access pattern, so I only consider some details to the implementation of those options.

最佳选择取决于您的数据和访问模式,因此我只考虑实现这些选项的一些细节。

Sample Data

样本数据

Lets generate a table with one full and one nearly empty daily partitions.

让我们生成一个表,其中包含一个完整的和一个几乎空的日常分区。

The table has a local index on the GROUP_IDcolumn. The purpose of the exercise is to get a FULL TABLE SCANwhile accessing the small partition and an INDEX ACCESSwhile accessing the big partition.

该表在GROUP_ID列上有一个本地索引。练习的目的是得到FULL TABLE SCAN一会儿访问小分区和INDEX ACCESS一会儿访问大分区。

CREATE TABLE  mytab
   (    id number not null,
      group_id number,
      trans_date date,
      pad varchar2(4000))
PARTITION BY RANGE (trans_date)
INTERVAL (NUMTODSINTERVAL(1,'DAY'))
(
   PARTITION part_01 values LESS THAN (TO_DATE('31-12-2016','DD-MM-YYYY'))
);

create index mytab_idx1 on mytab(id) local;
create index mytab_idx2 on mytab(group_id) local;

-- full day partition
insert into mytab (id, group_id, trans_date, pad)
select rownum id, trunc(rownum/1000) group_id, to_date('31122016','ddmmyyyy'), lpad('x',3000,'x') from dual 
connect by  level <= 100000;
commit;

-- nearly empty day partition
insert into mytab (id, group_id, trans_date, pad)
select rownum id, trunc(rownum/1000) group_id, to_date('01012017','ddmmyyyy'), lpad('x',3000,'x') from dual
connect by  level <= 1000; 
commit;

Dynamic Sampling

动态采样

If the target object have no statistics at all, Oracle performs dynamic sampling (aka dynamic statistics) With a little overhead Oracle calculates the statistics while parsing the statement. So it can't be stale.

如果目标对象根本没有统计信息,Oracle 会执行动态采样(又名动态统计),Oracle 会在解析语句的同时计算统计信息。所以它不可能是陈旧的。

Accessing the nearly empty partition Oracle proper choose FULL TABLE SCAN

访问几乎空的分区 Oracle 正确选择 FULL TABLE SCAN

EXPLAIN PLAN  SET STATEMENT_ID = 'jara1' into   plan_table  FOR
select   * from mytab 
where trans_date = TO_DATE('01-01-2017','DD-MM-YYYY') and group_id = 0;

SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL')); 

Plan hash value: 4018216072

------------------------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |   958 |  1905K|   274   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|       |   958 |  1905K|   274   (0)| 00:00:01 |     3 |     3 |
|*  2 |   TABLE ACCESS FULL    | MYTAB |   958 |  1905K|   274   (0)| 00:00:01 |     3 |     3 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TRANS_DATE"=TO_DATE(' 2017-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') 
              AND "GROUP_ID"=0)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

... while accessing the full partition INDEX ACCESSis used

...在访问完整分区时INDEX ACCESS使用

EXPLAIN PLAN  SET STATEMENT_ID = 'jara1' into   plan_table  FOR
select * from mytab 
where trans_date = TO_DATE('31-12-2016','DD-MM-YYYY') and group_id = 0;

SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));

Plan hash value: 984912596

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |            |  1608 |  3198K|  9021   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE                    |            |  1608 |  3198K|  9021   (1)| 00:00:01 |     2 |     2 |
|*  2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| MYTAB      |  1608 |  3198K|  9021   (1)| 00:00:01 |     2 |     2 |
|*  3 |    INDEX RANGE SCAN                        | MYTAB_IDX2 |  1608 |       |  2880   (1)| 00:00:01 |     2 |     2 |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TRANS_DATE"=TO_DATE(' 2016-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   3 - access("GROUP_ID"=0)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

So we see that the dynamic sampling works fine, choosing the proper access method.

所以我们看到动态采样工作正常,选择了正确的访问方法。

Gather Partition Statistics Frequently

经常收集分区统计信息

Repeating the gathering job mitigates the problem, that the partition is growing constantly.

重复收集作业可以缓解分区不断增长的问题。

The period depends on the transaction rate.

期间取决于交易率。

Example of gathering statistics for one partition only

仅收集一个分区的统计信息示例

 exec dbms_stats.gather_table_stats(OWNNAME=>user,TABNAME=>'MYTAB', PARTNAME=>'SYS_P10030',   CASCADE=> TRUE); 

The worst case that must be avoided is that *the statistics pointing that the partition is empty, but (in the meantime) the partition is heavy populated.

必须避免的最坏情况是 * 统计数据表明该分区为空,但(同时)该分区被大量填充。

Set Statistics

设置统计

This approach assumes, that the "correct" access path for the queries is known. In our example we may access a nearly empty partition with a FULL TABLE SCAN, but the index access is fine for such partition as well. So we can set partition statistics so, that always an INDEX ACCESS will be done.

这种方法假设查询的“正确”访问路径是已知的。在我们的示例中,我们可以使用 访问一个几乎为空的分区FULL TABLE SCAN,但索引访问也适用于此类分区。因此,我们可以设置分区统计信息,以便始终执行 INDEX ACCESS。

One possible (very simple) schema is to copy the statistics from the previous day.

一种可能(非常简单)的模式是复制前一天的统计数据。

This call copies the statistics from partition SYS_P10029to partition SYS_P10030

此调用将统计信息从一个分区复制SYS_P10029到另一个分区SYS_P10030

 exec DBMS_STATS.COPY_TABLE_STATS (OWNNAME=>user,TABNAME=>'MYTAB',srcpartname=>'SYS_P10029',dstpartname=> 'SYS_P10030');    

So in other word, immediately after the creation of the partition the statistics are initiated as for the full populated partition.

因此,换句话说,在分区创建后立即启动统计信息,如完整填充的分区。

回答by Wernfried Domscheit

In my application I run this procedure once a day by scheduler job. It collects statistics for most recent partition.

在我的应用程序中,我每天通过调度程序作业运行一次此过程。它收集最近分区的统计信息。

PROCEDURE GatherIndexStats IS

    CURSOR IndexPartition(indName IN VARCHAR2) IS
    SELECT INDEX_NAME, PARTITION_NAME
    FROM USER_IND_STATISTICS i
        JOIN USER_TAB_PARTITIONS t USING (TABLE_NAME, PARTITION_NAME)
    WHERE TABLE_NAME = 'ABC'
        AND i.LAST_ANALYZED IS NULL
        AND OBJECT_TYPE = 'PARTITION'
        AND INDEX_NAME = indName
    ORDER BY INDEX_NAME, PARTITION_NAME DESC
    OFFSET 1 ROW FETCH FIRST 2 ROW ONLY;

BEGIN

    FOR aIndex IN (SELECT INDEX_NAME FROM USER_INDEXES WHERE TABLE_NAME = 'ABC') LOOP
        FOR aInd IN IndexPartition(aIndex.INDEX_NAME) LOOP
            DBMS_STATS.GATHER_INDEX_STATS(USER, aInd.INDEX_NAME, aInd.PARTITION_NAME);
        END LOOP;
    END LOOP;   

END GatherIndexStats;

In my application I need to get only Index statistics rather than full table stats. In case you like to get Index and table stats, use procedure below:

在我的应用程序中,我只需要获取索引统计信息而不是全表统计信息。如果您想获取索引和表统计信息,请使用以下过程:

PROCEDURE GatherTableStats IS

    CURSOR TablePartition IS
    SELECT INDEX_NAME, PARTITION_NAME
    FROM USER_TAB_STATISTICS i
        JOIN USER_TAB_PARTITIONS t USING (TABLE_NAME, PARTITION_NAME)
    WHERE TABLE_NAME = 'ABC'
        AND i.LAST_ANALYZED IS NULL
        AND OBJECT_TYPE = 'PARTITION'
    ORDER BY PARTITION_NAME DESC
    OFFSET 1 ROW FETCH FIRST 2 ROW ONLY;

BEGIN

    FOR aPart IN TablePartition LOOP
        DBMS_STATS.GATHER_TABLE_STATS(USER, 'ABC', aPart.PARTITION_NAME);
    END LOOP;   

END GatherTableStats;

回答by Jon Heller

Statistics should be gathered as a part of any process that significantly changes the data. Do not rely on nightly jobs to gather statistics, especially in a large data warehouse.

统计数据应作为任何显着更改数据的过程的一部分进行收集。不要依赖夜间作业来收集统计信息,尤其是在大型数据仓库中。

Gathering statistics only in nightly jobs has many potential disadvantages:

仅在夜间工作中收集统计数据有许多潜在的缺点:

  1. The processing has a weird time dependency. Statistics windows can be tricky to coordinate. And sometimes if there's too much work the table you care about may not have time to get analyzed.
  2. There are several types of statistics jobs (scheduler jobs, DBA_JOBS, auto_tasks), all of which tend to get disabled more than they should.
  3. Gathering statistics at the wrong time is much worsethan not having statistics at all. If there are no statistics then Oracle can use dynamic sampling to do a decent job. But if the nightly job just happens to gather statistics during the brief period where the table is empty, the statistics may be horribly wrong and performance will suffer. I've seen this happen many times; these errors tend to get blamed on "environmental differences", but if you leave a critical step up to chance then environments are going to randomly fail.
  1. 处理具有奇怪的时间依赖性。统计窗口可能很难协调。有时,如果工作量太大,您关心的表格可能没有时间进行分析。
  2. 有几种类型的统计作业(调度程序作业、DBA_JOBS、auto_tasks),所有这些都倾向于被更多地禁用。
  3. 在错误的时间收集统计数据比根本没有统计数据要糟糕得多。如果没有统计数据,那么 Oracle 可以使用动态采样来做一个体面的工作。但是如果每晚的工作恰好在表为空的短暂时间内收集统计数据,那么统计数据可能会非常错误并且性能会受到影响。我已经看到这种情况发生很多次了;这些错误往往被归咎于“环境差异”,但如果您将关键步骤留给机会,那么环境将随机失败。

Gathering statistics as part of your data load process has many potential advantages. Since you understand the process and the table better than some generic nightly statistics job you can take advantage of many advanced features:

在数据加载过程中收集统计数据具有许多潜在优势。由于您比一些通用的夜间统计工作更了解流程和表格,因此您可以利用许多高级功能:

  1. If the system isn't busy after the data load then parallelism can be used with a parameter like DEGREE=>8.
  2. If it's a direct-path write in 12c you may be able to automatically gather stats while loading data with the GATHER_OPTIMIZER_STATISTICS hint.
  3. If it's an interval partitioned table you may want to setup incremental statistics gathering. This lets the process only spend time gathering statistics for the partition and the global statistics are updated for free.
  4. If the process disabled and rebuilds indexes it can avoid re-gathering index statistics with the parameter NOCASCADE=>TRUE.
  1. 如果系统在数据加载后不忙,那么并行性可以与类似 DEGREE=>8 的参数一起使用。
  2. 如果是 12c 中的直接路径写入,您可以在使用 GATHER_OPTIMIZER_STATISTICS 提示加载数据时自动收集统计信息。
  3. 如果它是一个间隔分区表,您可能需要设置增量统计信息收集。这让进程只花时间收集分区的统计信息,并且免费更新全局统计信息。
  4. 如果进程禁用并重建索引,它可以避免使用参数 NOCASCADE=>TRUE 重新收集索引统计信息。

Don't outsource statistics gathering to some other scheduled job. Statistics are so important and tricky that they should be fully integrated with any program that is making significant data changes.

不要将统计数据收集外包给其他一些预定的工作。统计是如此重要和棘手,以至于它们应该与进行重大数据更改的任何程序完全集成。