database Oracle 为何使用 DBMS_STATS.GATHER_TABLE_STATS?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15575422/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 08:54:48  来源:igfitidea点击:

Why does Oracle use DBMS_STATS.GATHER_TABLE_STATS?

databaseperformanceoracle

提问by Manth

I found documents explaining that Oracle uses these for performance tuning etc, but didn't quite understand what does it actually do.

我找到了解释 Oracle 使用这些进行性能调整等的文档,但不太明白它实际上做了什么。

Can someone explain it to me in simple words with very basic example?

有人可以用非常基本的例子用简单的话向我解释吗?

回答by Justin Cave

Most enterprise databases, Oracle included, use a cost-based optimizer to determine the appropriate query plan for a given SQL statement. This means that the optimizer uses information about the data to determine how to execute a query rather than relying on rules (this is what the older rule-based optimizer did).

大多数企业数据库(包括 Oracle)使用基于成本的优化器来确定给定 SQL 语句的适当查询计划。这意味着优化器使用有关数据的信息来确定如何执行查询而不是依赖规则(这是旧的基于规则的优化器所做的)。

For example, imagine a table for a simple bug-tracking application

例如,想象一个简单的错误跟踪应用程序的表

CREATE TABLE issues (
  issue_id number primary key,
  issue_text clob,
  issue_status varchar2(10)
);

CREATE INDEX idx_issue_status
    ON issues( issue_status );

If I'm a large company, I might have 1 million rows in this table. Of those, 100 have an issue_statusof ACTIVE, 10,000 have an issue_statusof QUEUED, and 989,900 have a status of COMPLETE. If I want to run a query against the table to find my active issues

如果我是一家大公司,那么这张表中可能有 100 万行。其中,100 个的状态issue_status为 ACTIVE,10,000issue_status个的状态为 QUEUED,989,900 个的状态为 COMPLETE。如果我想对表运行查询以查找我的活动问题

SELECT *
  FROM issues
 WHERE issue_status = 'ACTIVE'

the optimizer has a choice. It can either use the index on issue_statusand then do a single-row lookup in the table for each row in the index that matches or it can do a table scan on the issuestable. Which plan is more efficient will depend on the data that is in the table. If Oracle expects the query to return a small fraction of the data in the table, using the index would be more efficient. If Oracle expects the query to return a substantial fraction of the data in the table, a table scan would be more efficient.

优化器有一个选择。它可以使用索引issue_status,然后在表中对匹配的索引中的每一行进行单行查找,也可以对表进行表扫描issues。哪个计划更有效将取决于表中的数据。如果 Oracle 期望查询返回表中数据的一小部分,则使用索引会更有效。如果 Oracle 期望查询返回表中大部分数据,则表扫描会更有效。

DBMS_STATS.GATHER_TABLE_STATSis what gathers the statistics that allow Oracle to make this determination. It tells Oracle that there are roughly 1 million rows in the table, that there are 3 distinct values for the issue_statuscolumn, and that the data is unevenly distributed. So Oracle knows to use an index for the query to find all the active issues. But it also knows that when you turn around and try to look for all the closed issues

DBMS_STATS.GATHER_TABLE_STATS是收集统计信息以允许 Oracle 做出此决定的原因。它告诉 Oracle 表中大约有 100 万行,该列有 3 个不同的值issue_status,并且数据分布不均匀。因此 Oracle 知道使用查询的索引来查找所有活动问题。但它也知道,当你转身尝试寻找所有已解决的问题时

SELECT *
  FROM issues
 WHERE issue_status = 'CLOSED'

that it will be more efficient to do a table scan.

进行表扫描会更有效。

Gathering statistics allows the query plans to change over time as the data volumes and data distributions change. When you first install the issue tracker, you'll have very few COMPLETED issues and more ACTIVE and QUEUED issues. Over time, the number of COMPLETED issues rises much more quickly. As you get more rows in the table and the relative fraction of those rows that are in the various statuses change, the query plans will change so that, in the ideal world, you always get the most efficient plan possible.

收集统计信息允许查询计划随着数据量和数据分布的变化而随时间变化。当您第一次安装问题跟踪器时,您将遇到很少的 COMPLETED 问题和更多的 ACTIVE 和 QUEUED 问题。随着时间的推移,已完成问题的数量增加得更快。当您在表中获得更多行并且处于各种状态的这些行的相对比例发生变化时,查询计划也会发生变化,因此在理想情况下,您始终可以获得最有效的计划。