Oracle 10g:MIN/MAX 列值估计

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1102819/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 18:33:51  来源:igfitidea点击:

Oracle 10g: MIN/MAX column value estimation

oracleoracle10g

提问by Kage

is it possible to retrieve statistics about the minimal or maximal value of a numeric column in Oracle 10g? I have found the table USER_TAB_COL_STATISTICS having a LOW_VALUE and HIGH_VALUE column, but I am not sure whether those are the values I am looking for.

是否可以检索有关 Oracle 10g 中数字列的最小值或最大值的统计信息?我发现表 USER_TAB_COL_STATISTICS 有一个 LOW_VALUE 和 HIGH_VALUE 列,但我不确定这些是否是我正在寻找的值。

I need to find an efficient way to ask the DBS for those statistics. Using a regular MIN(a) and MAX(a) query would be too slow on large tables.

我需要找到一种有效的方法来向 DBS 询问这些统计数据。在大型表上使用常规 MIN(a) 和 MAX(a) 查询会太慢。

Thanks in advance.

提前致谢。

回答by Tony Andrews

Yes, LOW_VALUE and HIGH_VALUE will tell you the minimum and maximum values in the column but:

是的,LOW_VALUE 和 HIGH_VALUE 会告诉你列中的最小值和最大值,但是

  • they are stored as RAW(32) columns, so the meaning will not be immediately apparent
  • they will be as of the last time statistics were gathered for the table, so may not be accurate (unless you explicitly gather stats before using them)
  • 它们存储为 RAW(32) 列,因此含义不会立即显现
  • 它们将是上次为表格收集统计信息时的数据,因此可能不准确(除非您在使用它们之前明确收集统计信息)

If you index the column then MIN(a) and MAX(a) should be very fast as in this example where T1 has 50000 rows and is indexed on OBJECT_ID:

如果对列进行索引,则 MIN(a) 和 MAX(a) 应该非常快,如本示例中 T1 有 50000 行并在 OBJECT_ID 上建立索引:

SQL> select min(object_id) from t1;

MIN(OBJECT_ID)
--------------
           100

------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |     1 |     5 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |       |     1 |     5 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T1_ID | 53191 |   259K|     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        419  bytes sent via SQL*Net to client
        380  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The result is the same if you select the MAX instead of the MIN. However, if you select the MIN and MAX in a single select statement the result is different:

如果您选择 MAX 而不是 MIN,结果是相同的。但是,如果在单个 select 语句中选择 MIN 和 MAX,结果会有所不同:

SQL> select min(object_id), max(object_id) from t1;

MIN(OBJECT_ID) MAX(OBJECT_ID)
-------------- --------------
           100          72809


-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |     5 |    34   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |     5 |            |          |
|   2 |   INDEX FAST FULL SCAN| T1_ID | 53191 |   259K|    34   (0)| 00:00:01 |
-------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        125  consistent gets
          0  physical reads
          0  redo size
        486  bytes sent via SQL*Net to client
        380  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

This suggests that it may be better to get them separately, though I haven't conclusively proved that.

这表明分开获取它们可能更好,尽管我还没有最终证明这一点。

回答by Rob van Wijk

An example with a table containing numbers from 1 up to 1234:

包含从 1 到 1234 的数字的表格的示例:

SQL> create table t (nr) as select level from dual connect by level <= 1234
  2  /

Tabel is aangemaakt.

SQL> select min(nr)
  2       , max(nr)
  3    from t
  4  /

   MIN(NR)    MAX(NR)
---------- ----------
         1       1234

1 rij is geselecteerd.

If you analyze the table, the low_value and high_value columns contain the right numbers.

如果您分析该表,low_value 和 high_value 列包含正确的数字。

SQL> exec dbms_stats.gather_table_stats(user,'t')

PL/SQL-procedure is geslaagd.

SQL> select low_value
  2       , high_value
  3    from user_tab_columns
  4   where table_name = 'T'
  5     and column_name = 'NR'
  6  /

LOW_VALUE                                                        HIGH_VALUE
---------------------------------------------------------------- ----------------
C102                                                             C20D23

1 rij is geselecteerd.

They are raw, so they cannot be read easily. Using the utl_raw.cast_to_number function makes them readable:

它们是原始的,因此不容易阅读。使用 utl_raw.cast_to_number 函数使它们可读:

SQL> select utl_raw.cast_to_number(low_value)
  2       , utl_raw.cast_to_number(high_value)
  3    from user_tab_columns
  4   where table_name = 'T'
  5     and column_name = 'NR'
  6  /

UTL_RAW.CAST_TO_NUMBER(LOW_VALUE) UTL_RAW.CAST_TO_NUMBER(HIGH_VALUE)
--------------------------------- ----------------------------------
                                1                               1234

1 rij is geselecteerd.

However, be careful: the numbers may be inaccurate when updates have taken place between the time the statistics were gathered and the time the query ran.

但是,请注意:在收集统计信息和运行查询之间发生更新时,数字可能不准确。

Regards, Rob.

问候,罗伯。

回答by Jeffrey Kemp

The other answers here (using an index fast full scan; or examining the user_tab_columnsstatistics) are excellent.

这里的其他答案(使用索引快速全扫描;或检查user_tab_columns统计数据)非常好。

Here's another method that might be suitable - if you're only interested in a rough estimate, you can use the SAMPLEclause (and adjust the sample size up or down depending on how accurate you need it):

这是另一种可能合适的方法 - 如果您只对粗略估计感兴趣,您可以使用该SAMPLE条款(并根据您需要的准确程度向上或向下调整样本量):

SELECT max(value), min(value) FROM t SAMPLE(1);

This takes a 1% sample from the table. It will generally sample different rows each time it is run, so don't expect the results to be identical run-to-run. If you want it to run quicker, you can have lower sample sizes, e.g. SAMPLE(0.01), or if you want to sample half the table, SAMPLE(50).

这需要从表中抽取 1% 的样本。它通常会在每次运行时对不同的行进行采样,因此不要期望每次运行的结果都相同。如果您希望它运行得更快,您可以使用较小的样本量,例如SAMPLE(0.01),或者如果您想对一半的表进行抽样,则SAMPLE(50)

The advantage of this approach over the "analyze, then-query-user-tab-cols" approach is that the analyze runs queries like this anyway in order to generate the statistics - so doing it this way may mean less work overall.

这种方法相对于“分析,然后查询用户选项卡列”方法的优势在于,分析无论如何都会运行这样的查询以生成统计信息 - 所以这样做可能意味着整体工作量减少。

回答by Vadzim

In my case the column of interest has TIMESTAMPtype and it seems there's no UTL_RAW.CAST_TO_TIMESTAMPfunction.

在我的情况下,感兴趣的列有TIMESTAMP类型,似乎没有UTL_RAW.CAST_TO_TIMESTAMP功能。

It helped to use the trick from http://www.oaktable.net/content/convert-rawhex-timestamp-0to convert Oracle RAWtype to TIMESTAMP:

它有助于使用http://www.oaktable.net/content/convert-rawhex-timestamp-0 中的技巧将 OracleRAW类型转换为TIMESTAMP

select to_timestamp(
        to_char( to_number( substr( p_str, 1, 2 ), 'xx' ) - 100, 'fm00' ) ||
        to_char( to_number( substr( p_str, 3, 2 ), 'xx' ) - 100, 'fm00' ) ||
        to_char( to_number( substr( p_str, 5, 2 ), 'xx' ), 'fm00' ) ||
        to_char( to_number( substr( p_str, 7, 2 ), 'xx' ), 'fm00' ) ||
        to_char( to_number( substr( p_str,9, 2 ), 'xx' )-1, 'fm00' ) ||
        to_char( to_number( substr( p_str,11, 2 ), 'xx' )-1, 'fm00' ) ||
        to_char( to_number( substr( p_str,13, 2 ), 'xx' )-1, 'fm00' ), 'yyyymmddhh24miss' )
from (
select low_value p_str from user_tab_columns
   where table_name = 'MESSAGE' and column_name = 'TS'
)