Oracle 10g:MIN/MAX 列值估计
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1102819/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Oracle 10g: MIN/MAX column value estimation
提问by Kage
is it possible to retrieve statistics about the minimal or maximal value of a numeric column in Oracle 10g? I have found the table USER_TAB_COL_STATISTICS having a LOW_VALUE and HIGH_VALUE column, but I am not sure whether those are the values I am looking for.
是否可以检索有关 Oracle 10g 中数字列的最小值或最大值的统计信息?我发现表 USER_TAB_COL_STATISTICS 有一个 LOW_VALUE 和 HIGH_VALUE 列,但我不确定这些是否是我正在寻找的值。
I need to find an efficient way to ask the DBS for those statistics. Using a regular MIN(a) and MAX(a) query would be too slow on large tables.
我需要找到一种有效的方法来向 DBS 询问这些统计数据。在大型表上使用常规 MIN(a) 和 MAX(a) 查询会太慢。
Thanks in advance.
提前致谢。
回答by Tony Andrews
Yes, LOW_VALUE and HIGH_VALUE will tell you the minimum and maximum values in the column but:
是的,LOW_VALUE 和 HIGH_VALUE 会告诉你列中的最小值和最大值,但是:
- they are stored as RAW(32) columns, so the meaning will not be immediately apparent
- they will be as of the last time statistics were gathered for the table, so may not be accurate (unless you explicitly gather stats before using them)
- 它们存储为 RAW(32) 列,因此含义不会立即显现
- 它们将是上次为表格收集统计信息时的数据,因此可能不准确(除非您在使用它们之前明确收集统计信息)
If you index the column then MIN(a) and MAX(a) should be very fast as in this example where T1 has 50000 rows and is indexed on OBJECT_ID:
如果对列进行索引,则 MIN(a) 和 MAX(a) 应该非常快,如本示例中 T1 有 50000 行并在 OBJECT_ID 上建立索引:
SQL> select min(object_id) from t1;
MIN(OBJECT_ID)
--------------
100
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
| 2 | INDEX FULL SCAN (MIN/MAX)| T1_ID | 53191 | 259K| 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
419 bytes sent via SQL*Net to client
380 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
The result is the same if you select the MAX instead of the MIN. However, if you select the MIN and MAX in a single select statement the result is different:
如果您选择 MAX 而不是 MIN,结果是相同的。但是,如果在单个 select 语句中选择 MIN 和 MAX,结果会有所不同:
SQL> select min(object_id), max(object_id) from t1;
MIN(OBJECT_ID) MAX(OBJECT_ID)
-------------- --------------
100 72809
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 34 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
| 2 | INDEX FAST FULL SCAN| T1_ID | 53191 | 259K| 34 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
125 consistent gets
0 physical reads
0 redo size
486 bytes sent via SQL*Net to client
380 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
This suggests that it may be better to get them separately, though I haven't conclusively proved that.
这表明分开获取它们可能更好,尽管我还没有最终证明这一点。
回答by Rob van Wijk
An example with a table containing numbers from 1 up to 1234:
包含从 1 到 1234 的数字的表格的示例:
SQL> create table t (nr) as select level from dual connect by level <= 1234
2 /
Tabel is aangemaakt.
SQL> select min(nr)
2 , max(nr)
3 from t
4 /
MIN(NR) MAX(NR)
---------- ----------
1 1234
1 rij is geselecteerd.
If you analyze the table, the low_value and high_value columns contain the right numbers.
如果您分析该表,low_value 和 high_value 列包含正确的数字。
SQL> exec dbms_stats.gather_table_stats(user,'t')
PL/SQL-procedure is geslaagd.
SQL> select low_value
2 , high_value
3 from user_tab_columns
4 where table_name = 'T'
5 and column_name = 'NR'
6 /
LOW_VALUE HIGH_VALUE
---------------------------------------------------------------- ----------------
C102 C20D23
1 rij is geselecteerd.
They are raw, so they cannot be read easily. Using the utl_raw.cast_to_number function makes them readable:
它们是原始的,因此不容易阅读。使用 utl_raw.cast_to_number 函数使它们可读:
SQL> select utl_raw.cast_to_number(low_value)
2 , utl_raw.cast_to_number(high_value)
3 from user_tab_columns
4 where table_name = 'T'
5 and column_name = 'NR'
6 /
UTL_RAW.CAST_TO_NUMBER(LOW_VALUE) UTL_RAW.CAST_TO_NUMBER(HIGH_VALUE)
--------------------------------- ----------------------------------
1 1234
1 rij is geselecteerd.
However, be careful: the numbers may be inaccurate when updates have taken place between the time the statistics were gathered and the time the query ran.
但是,请注意:在收集统计信息和运行查询之间发生更新时,数字可能不准确。
Regards, Rob.
问候,罗伯。
回答by Jeffrey Kemp
The other answers here (using an index fast full scan; or examining the user_tab_columns
statistics) are excellent.
这里的其他答案(使用索引快速全扫描;或检查user_tab_columns
统计数据)非常好。
Here's another method that might be suitable - if you're only interested in a rough estimate, you can use the SAMPLE
clause (and adjust the sample size up or down depending on how accurate you need it):
这是另一种可能合适的方法 - 如果您只对粗略估计感兴趣,您可以使用该SAMPLE
条款(并根据您需要的准确程度向上或向下调整样本量):
SELECT max(value), min(value) FROM t SAMPLE(1);
This takes a 1% sample from the table. It will generally sample different rows each time it is run, so don't expect the results to be identical run-to-run. If you want it to run quicker, you can have lower sample sizes, e.g. SAMPLE(0.01)
, or if you want to sample half the table, SAMPLE(50)
.
这需要从表中抽取 1% 的样本。它通常会在每次运行时对不同的行进行采样,因此不要期望每次运行的结果都相同。如果您希望它运行得更快,您可以使用较小的样本量,例如SAMPLE(0.01)
,或者如果您想对一半的表进行抽样,则SAMPLE(50)
。
The advantage of this approach over the "analyze, then-query-user-tab-cols" approach is that the analyze runs queries like this anyway in order to generate the statistics - so doing it this way may mean less work overall.
这种方法相对于“分析,然后查询用户选项卡列”方法的优势在于,分析无论如何都会运行这样的查询以生成统计信息 - 所以这样做可能意味着整体工作量减少。
回答by Vadzim
In my case the column of interest has TIMESTAMP
type and it seems there's no UTL_RAW.CAST_TO_TIMESTAMP
function.
在我的情况下,感兴趣的列有TIMESTAMP
类型,似乎没有UTL_RAW.CAST_TO_TIMESTAMP
功能。
It helped to use the trick from http://www.oaktable.net/content/convert-rawhex-timestamp-0to convert Oracle RAW
type to TIMESTAMP
:
它有助于使用http://www.oaktable.net/content/convert-rawhex-timestamp-0 中的技巧将 OracleRAW
类型转换为TIMESTAMP
:
select to_timestamp(
to_char( to_number( substr( p_str, 1, 2 ), 'xx' ) - 100, 'fm00' ) ||
to_char( to_number( substr( p_str, 3, 2 ), 'xx' ) - 100, 'fm00' ) ||
to_char( to_number( substr( p_str, 5, 2 ), 'xx' ), 'fm00' ) ||
to_char( to_number( substr( p_str, 7, 2 ), 'xx' ), 'fm00' ) ||
to_char( to_number( substr( p_str,9, 2 ), 'xx' )-1, 'fm00' ) ||
to_char( to_number( substr( p_str,11, 2 ), 'xx' )-1, 'fm00' ) ||
to_char( to_number( substr( p_str,13, 2 ), 'xx' )-1, 'fm00' ), 'yyyymmddhh24miss' )
from (
select low_value p_str from user_tab_columns
where table_name = 'MESSAGE' and column_name = 'TS'
)