SQL 在 Teradata 中使用 COLLECT STATISTICS
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16674017/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using COLLECT STATISTICS in Teradata
提问by ChrisCamp
In Teradata I can use a statement like ...
在 Teradata 中,我可以使用类似...
collect statistics on my_table column(col1)
This will gather stats on the table and store them in DBC views like ColumnStats, IndexStats and MultiColumnStats. I'm also under the impression that the optimizer (parsing engine) will find the statistics when they are available and use them instead of estimated table cardinality/index value counts to make better decisions on how to execute a query.
这将收集表上的统计信息并将它们存储在 DBC 视图中,如 ColumnStats、IndexStats 和 MultiColumnStats。我还认为优化器(解析引擎)会在可用时找到统计信息,并使用它们而不是估计的表基数/索引值计数来更好地决定如何执行查询。
This all sounds great, but I have some questions.
这一切听起来不错,但我有一些问题。
- are there any disadvantages to using
collect stats
? - When is it appropriate/inappropriate to use collect statistics in your SQL scripting?
- What's the performance benefit to collect statistics on a field that's already indexed?
- How long are statistics stored for (table, volatile tables)?
- Any other comments concerning
collect statistics
would be appreciated.
- 使用有什么缺点
collect stats
吗? - 什么时候在 SQL 脚本中使用收集统计数据合适/不合适?
- 在已编入索引的字段上收集统计信息有什么性能优势?
- (表,易失性表)的统计信息存储多长时间?
- 任何其他有关的意见
collect statistics
将不胜感激。
回答by user2407394
1>are there any disadvantages to using collect stats?
1>使用收集统计数据有什么缺点吗?
Yes, collect stats itself is time consuming, it actually locate data from AMPS and insert the stats in dictionary tables.
是的,收集统计信息本身很耗时,它实际上是从 AMPS 中定位数据并将统计信息插入字典表中。
Suppose you have a table definition like:
假设您有一个表定义,如:
ct t1(x1 int,y1 int, z1 int);
ct t1(x1 int,y1 int, z1 int);
The table contains millions of rows and z1 is never used in the ST/Join conditions, then it is not worth to collect stats on z1.
该表包含数百万行,并且在 ST/Join 条件下从未使用 z1,因此在 z1 上收集统计信息是不值得的。
2>When is it appropriate/inappropriate to use collect statistics in your SQL scripting?
2>什么时候在你的SQL脚本中使用collect statistics合适/不合适?
Already answered above. If a column is going to be used as ST/Join condition .i.e in where or on clause, you must collect stats, otherwise not needed.
上面已经回答了。如果列将用作 ST/Join 条件 .ie 在 where 或 on 子句中,则必须收集统计信息,否则不需要。
3>What's the performance benefit to collect statistics on a field that's already indexed?
3>对已编入索引的字段收集统计信息有什么性能优势?
ct t1(x1 int,y1 int) primary index(x1);
ct t1(x1 int,y1 int) 主索引(x1);
for a simple query like sel * from t1 where x1 = 5;
对于像 sel * from t1 where x1 = 5 这样的简单查询;
will demonstrate the usefulness of collect stats.
将展示收集统计数据的用处。
How?
如何?
the optimizer can correctly estimate how many rows this query will select and if t1 is going to be joined with say t2, a efficient join will be chosen by optimizer.
优化器可以正确估计此查询将选择多少行,如果 t1 将要与 t2 连接,优化器将选择一个有效的连接。
4>How long are statistics stored for (table, volatile tables)?
4>(表,易失性表)的统计信息存储多长时间?
Table : permanently.
表:永久。
volatile tables: till session expires.
易失性表:直到会话过期。
5>Any other comments concerning collect statistics would be appreciated.
5>关于收集统计数据的任何其他评论将不胜感激。
Nothing has been discussed about multicolumn stats.
没有讨论多列统计信息。
Say, the query is like:
说,查询是这样的:
sel * from t1 join t2 on y1=y2 and x1=2;
sel * from t1 join t2 on y1=y2 and x1=2;
then collecting multi-column stats on (x1,y1) would be quite helpful in optimization.
那么在 (x1,y1) 上收集多列统计信息对优化非常有帮助。
Also, if table demography has been changed (increased number of rows) you must consider re-collecting the stats
此外,如果表格人口统计已更改(行数增加),您必须考虑重新收集统计信息