SQL SQL中的表扫描和索引扫描
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8702905/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Table Scan and Index Scan in SQL
提问by Manisha Awasthi
What is the difference between Table scan and Index scan in SQL and where it is used specifically?
SQL中Table scan和Index scan有什么区别,具体用在什么地方?
回答by dani herrera
Table scan means iterate over all table rows.
表扫描意味着遍历所有表行。
Index scan means iterate over all index items, when item index meets search condition, table row is retrived through index.
索引扫描意味着遍历所有索引项,当项索引满足搜索条件时,通过索引检索表行。
Usualy index scan is less expensive than a table scan because index is more flat than a table.
通常索引扫描比表扫描便宜,因为索引比表更平坦。
They are lot of bibliografy about this issue. Sample:
他们有很多关于这个问题的参考书目。样本:
- Microsoft: Which is Faster: Index Access or Table Scan?:
- 微软:哪个更快:索引访问还是表扫描?:
Index access is an access method in which SQL Server uses an existing index to read and write data pages. Because index access significantly reduces the number of I/O read operations, it often outperforms a table scan.
索引访问是 SQL Server 使用现有索引读取和写入数据页的一种访问方法。由于索引访问显着减少了 I/O 读取操作的数量,因此它的性能通常优于表扫描。
- Oracle: The Query Optimizer
- Oracle:查询优化器
In this method, a row is retrieved by traversing the index, using the indexed column values specified by the statement. An index scan retrieves data from an index based on the value of one or more columns in the index. To perform an index scan, Oracle searches the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index, then Oracle reads the indexed column values directly from the index, rather than from the table.
在此方法中,通过使用语句指定的索引列值遍历索引来检索行。索引扫描根据索引中一个或多个列的值从索引中检索数据。为了执行索引扫描,Oracle 在索引中搜索语句访问的索引列值。如果语句只访问索引的列,则 Oracle 直接从索引而不是从表中读取索引列的值。
- MySql: How to Avoid Table Scans
- MySql:如何避免表扫描
回答by Olivier Jacot-Descombes
Most query engines have a query optimizer, which tries to generate an effective query execution strategy. If indexes are available, which can make a query faster, then the query optimizer will perform an index scan or index seek, otherwise a table scan.
大多数查询引擎都有一个查询优化器,它会尝试生成有效的查询执行策略。如果索引可用,这可以使查询更快,那么查询优化器将执行索引扫描或索引查找,否则执行表扫描。
Example:
例子:
SELECT * FROM tbl WHERE category_id = 5;
If there is no index on category_id then a table scan will be performed, i.e. every single record in the table will be inspected for the right category_id.
如果 category_id 上没有索引,则将执行表扫描,即将检查表中的每条记录以获取正确的 category_id。
If, however, category_id is indexed the things become more complicated. If the table is very large, an index seek will probably be chosen. However, if the table is small, then the optimizer might decide that a table scan is still faster, since some overhead is required to access an index. If the category_id is not selective enough, for instance if there are only two categories, scanning the table might be faster even for big tables.
但是,如果对 category_id 进行索引,事情就会变得更加复杂。如果表非常大,可能会选择索引查找。但是,如果表很小,那么优化器可能会决定表扫描仍然更快,因为访问索引需要一些开销。如果 category_id 的选择性不够,例如如果只有两个类别,即使对于大表,扫描表也可能更快。
Indexes are usually organized as tree structures. Finding an item in a tree is an O(log n) operation. A table scan is an O(n) operation. The speed is mainly determined by the number of disk accesses required to perform the query. Seeking the index first and then accessing the table for the found entries can generate more disk accesses for small tables.
索引通常组织为树结构。在树中查找项目是一个 O(log n) 操作。表扫描是 O(n) 操作。速度主要取决于执行查询所需的磁盘访问次数。首先查找索引,然后访问找到的条目的表可以为小表生成更多的磁盘访问。
Let us have a look at another query:
让我们看看另一个查询:
SELECT category_id FROM tbl WHERE category_id BETWEEN 10 AND 100;
Here there is another option available. An index seek might not be faster than a table scan in this situation, but, since we are only retrieving catergory_id's an index scan (not index seek) might be even faster. An index scan reads every entry of the index table instead of taking advantage of the tree structure (what the index seek does). However, since the requested information is fully contained in the index, no access to the data table will be required. Index scan is, like the table scan an O(n) operation, but since the index is usually smaller than the table, fewer disk accesses are required to scan the index than to scan the table.
这里还有另一种选择。在这种情况下,索引查找可能不会比表扫描快,但是,由于我们只检索 category_id 的索引扫描(不是索引查找)可能会更快。索引扫描读取索引表的每个条目,而不是利用树结构(索引查找的作用)。但是,由于请求的信息完全包含在索引中,因此不需要访问数据表。索引扫描与表扫描一样是 O(n) 操作,但由于索引通常小于表,因此扫描索引所需的磁盘访问次数少于扫描表所需的磁盘访问次数。
The whole matter is very complicated and depends very much on the database engine. If you want to know more, read the documentation provided by the db vendor.
整个事情非常复杂,很大程度上取决于数据库引擎。如果您想了解更多信息,请阅读 db 供应商提供的文档。
回答by Ben
As @danihp has answered the first part of the question I'll attempt to answer the second "where is it used specifically". This is for Oracle but it holds true for most RDBMS.
由于@danihp 已经回答了问题的第一部分,我将尝试回答第二个“它具体在哪里使用”。这适用于 Oracle,但适用于大多数 RDBMS。
Let's assume we have a table my_table
, which is indexed uniquely on a column id
and has a second index, which is non-unique, on the column yet_another_column
:
假设我们有一个 table my_table
,它在列上唯一id
索引,并且在列上有第二个非唯一索引yet_another_column
:
create my_table ( id varchar2(20) not null
, another_column not null
, yet_another_column
, constraint pk_my_table primary key (id)
);
create index i_my_table on my_table ( yet_another_column );
Now, if we were to select * from my_table where id = '1'
this would / should do a unique index scanof the index pk_my_table
. Then we re-enter the table, using the index, to return everything in my_table
where id = '1'
.
现在,如果我们select * from my_table where id = '1'
这个会/应该做一个唯一索引扫描索引pk_my_table
。然后我们重新输入表,使用索引,返回my_table
where 中的所有内容id = '1'
。
If the query were, instead, select id from my_table where id = 'a'
then there is no need for the second stage as all the values we need are contained within the index. In this case the query would solely do an unique index scan.
如果查询是,select id from my_table where id = 'a'
则不需要第二阶段,因为我们需要的所有值都包含在索引中。在这种情况下,查询将单独执行唯一索引扫描。
Next, if our query were select * from my_table where yet_another_column = 'y'
then we have an index on the column but it is not uniqueso we're going to have to look through the entire index to try to find all values that match our where condition, i.e. an index scan. Once again we're select columns that are not in our index so we have to re-enter the table to get them.
接下来,如果我们的查询是,select * from my_table where yet_another_column = 'y'
那么我们在列上有一个索引,但它不是唯一的,所以我们将不得不查看整个索引以尝试找到与我们的 where 条件匹配的所有值,即索引扫描。我们再次选择不在索引中的列,因此我们必须重新输入表以获取它们。
Lastly, if our query were select id from my_table where another_column = 'yes'
. We have no index on another_column
so we have to do a table scanto find the value, i.e. we have to find everything in the table where another_column = 'yes'
.
最后,如果我们的查询是select id from my_table where another_column = 'yes'
. 我们没有索引,another_column
所以我们必须进行表扫描才能找到值,即我们必须找到表中的所有内容where another_column = 'yes'
。
Now, there might not seem to be much of a difference, between a table scan and an index scan in these instances. We still have to go and find a value in an object in the database. However, as the index is much smaller and specially designed to be scanned ( see other answers ) it is generallymuch faster to do an index scan if you only want a small proportion of the rows in the table. If you want say 10% of the table then this point becomes "it depends".
现在,在这些情况下,表扫描和索引扫描之间似乎没有太大区别。我们仍然需要在数据库中的一个对象中找到一个值。但是,由于索引要小得多并且专门设计用于扫描(请参阅其他答案),如果您只需要表中的一小部分行,则执行索引扫描通常要快得多。如果您想说表格的 10%,那么这一点就变成了“视情况而定”。
回答by Aaron Bertrand
For SQL Server at least:
至少对于 SQL Server:
An index scan can be faster because, presumably, the index doesn't cover the entire set of columns in the table, while a table (or clustered index) scan has to read all of the data. If an index does include all of the columns in the table, then it should be roughly equivalent to a table scan, and the choice between an index scan and table (or CIX) scan will be a coin toss. The difference is that when you have fewer columns in the index, you can fit more index rows on an 8kb page, leading to fewer overall pages you have to read in order to scan all of the data in the index.
索引扫描可以更快,因为大概索引不覆盖表中的整个列集,而表(或聚集索引)扫描必须读取所有数据。如果索引确实包括表中的所有列,那么它应该大致相当于表扫描,在索引扫描和表(或 CIX)扫描之间的选择将是掷硬币。不同之处在于,当索引中的列较少时,您可以在 8kb 的页面上放置更多的索引行,从而减少为扫描索引中的所有数据而需要读取的整体页面。
To illustrate what I mean, imagine if you have two copies of the phone book, one with last name, first name, street address, and phone number, and one with just last name, first name, and phone number. Now imagine that because the street address doesn't have to be printed, you can fit two extra columns of names and phone numbers on any page in the phone book. The end result of this is that the phone book is thinner, because you can fit the same number of phone numbers on fewer pages. Next, imagine you are charged with counting the number of phone numbers in the book. Which would you choose, the one with the street address listed (which has more pages, analogous to a table scan) or the one without the street address (which has fewer pages, analogous to most index scans)? I would choose the one with fewer pages.
为了说明我的意思,假设您有两份电话簿,一份有姓氏、名字、街道地址和电话号码,另一份只有姓氏、名字和电话号码。现在想象一下,由于不必打印街道地址,您可以在电话簿的任何页面上放置两列额外的姓名和电话号码。这样做的最终结果是电话簿更薄,因为您可以在更少的页面上放置相同数量的电话号码。接下来,假设您负责计算书中电话号码的数量。你会选择哪一个,列出街道地址(它有更多的页面,类似于表扫描)或没有街道地址的那一个(它有更少的页面,类似于大多数索引扫描)?我会选择页数较少的那一种。
Another wrinkle in this is that some indexes can be filtered, meaning that not only do they have fewer columns in most cases (and therefore can fit more rows onto a single page), but they can also have a WHERE clause that eliminates a lot of rows. In this case, as well, an index scan will be better than a table scan (but this will only work for queries that have a matching WHERE clause and the same semantics).
另一个问题是某些索引可以被过滤,这意味着它们不仅在大多数情况下具有更少的列(因此可以在单个页面上容纳更多行),而且它们还可以有一个 WHERE 子句来消除很多行。在这种情况下,索引扫描也比表扫描更好(但这仅适用于具有匹配 WHERE 子句和相同语义的查询)。