SQL 表扫描和聚集索引扫描有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18764/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What's the difference between a Table Scan and a Clustered Index Scan?
提问by Seibar
Since both a Table Scan
and a Clustered Index Scan
essentially scan all records in the table, why is a Clustered Index Scan supposedly better?
由于 aTable Scan
和 aClustered Index Scan
本质上都扫描表中的所有记录,为什么聚集索引扫描应该更好?
As an example - what's the performance difference between the following when there are many records?:
例如 - 当有很多记录时,以下之间的性能差异是什么?:
declare @temp table(
SomeColumn varchar(50)
)
insert into @temp
select 'SomeVal'
select * from @temp
-----------------------------
declare @temp table(
RowID int not null identity(1,1) primary key,
SomeColumn varchar(50)
)
insert into @temp
select 'SomeVal'
select * from @temp
回答by Mark Brackett
In a table without a clustered index (a heap table), data pages are not linked together - so traversing pages requires a lookup into the Index Allocation Map.
在没有聚集索引的表(堆表)中,数据页没有链接在一起 - 因此遍历页需要查找索引分配映射。
A clustered table, however, has it's data pages linked in a doubly linked list- making sequential scans a bit faster. Of course, in exchange, you have the overhead of dealing with keeping the data pages in order on INSERT
, UPDATE
, and DELETE
. A heap table, however, requires a second write to the IAM.
但是,聚簇表的数据页链接在双向链表中- 使顺序扫描速度更快。当然,作为交换,你必须处理保持数据页顺序的开销INSERT
,UPDATE
和DELETE
。但是,堆表需要对 IAM 进行第二次写入。
If your query has a RANGE
operator (e.g.: SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100
), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering.
如果您的查询具有RANGE
运算符(例如SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100
:),那么聚簇表(按保证顺序排列)会更有效 - 因为它可以使用索引页来查找相关数据页。堆必须扫描所有行,因为它不能依赖于排序。
And, of course, a clustered index lets you do a CLUSTERED INDEX SEEK, which is pretty much optimal for performance...a heap with no indexes would always result in a table scan.
而且,当然,聚集索引可以让您执行 CLUSTERED INDEX SEEK,这对于性能来说非常理想……没有索引的堆总是会导致表扫描。
So:
所以:
For your example query where you select all rows, the only difference is the doubly linked list a clustered index maintains. This should make your clustered table just a tiny bit faster than a heap with a large number of rows.
For a query with a
WHERE
clause that can be (at least partially) satisfied by the clustered index, you'll come out ahead because of the ordering - so you won't have to scan the entire table.For a query that is not satisified by the clustered index, you're pretty much even...again, the only difference being that doubly linked list for sequential scanning. In either case, you're suboptimal.
For
INSERT
,UPDATE
, andDELETE
a heap may or may not win. The heap doesn't have to maintain order, but does require a second write to the IAM. I think the relative performance difference would be negligible, but also pretty data dependent.
对于您选择所有行的示例查询,唯一的区别是聚集索引维护的双向链表。这应该使您的聚簇表比具有大量行的堆快一点点。
对于带有
WHERE
可以(至少部分)由聚集索引满足的子句的查询,由于排序,您将领先 - 因此您不必扫描整个表。对于聚簇索引不满足的查询,您几乎甚至......同样,唯一的区别是用于顺序扫描的双向链表。在任何一种情况下,你都是次优的。
对于
INSERT
、UPDATE
和DELETE
堆可能会也可能不会赢。堆不必维护顺序,但确实需要对 IAM 进行第二次写入。我认为相对性能差异可以忽略不计,但也非常依赖于数据。
Microsoft has a whitepaperwhich compares a clustered index to an equivalent non-clustered index on a heap (not exactly the same as I discussed above, but close). Their conclusion is basically to put a clustered index on all tables. I'll do my best to summarize their results (again, note that they're really comparing a non-clustered index to a clustered index here - but I think it's relatively comparable):
Microsoft 有一份白皮书将聚集索引与堆上的等效非聚集索引进行比较(与我上面讨论的不完全相同,但很接近)。他们的结论基本上是在所有表上放置聚集索引。我会尽力总结他们的结果(再次注意,他们实际上是在比较非聚集索引和聚集索引——但我认为它是相对可比的):
INSERT
performance: clustered index wins by about 3% due to the second write needed for a heap.UPDATE
performance: clustered index wins by about 8% due to the second lookup needed for a heap.DELETE
performance: clustered index wins by about 18% due to the second lookup needed and the second delete needed from the IAM for a heap.- single
SELECT
performance: clustered index wins by about 16% due to the second lookup needed for a heap. - range
SELECT
performance: clustered index wins by about 29% due to the random ordering for a heap. - concurrent
INSERT
: heap table wins by 30% under load due to page splits for the clustered index.
INSERT
性能:由于堆需要第二次写入,聚集索引的优势大约为 3%。UPDATE
性能:由于堆需要第二次查找,聚集索引的优势大约为 8%。DELETE
性能:由于需要第二次查找以及需要从 IAM 中对堆进行第二次删除,聚集索引的优势大约为 18%。- 单一
SELECT
性能:由于堆需要第二次查找,聚集索引的优势大约为 16%。 - 范围
SELECT
性能:由于堆的随机排序,聚集索引胜出约 29%。 - concurrent
INSERT
:由于聚集索引的页面拆分,堆表在负载下胜出 30%。
回答by Stu
http://msdn.microsoft.com/en-us/library/aa216840(SQL.80).aspx
http://msdn.microsoft.com/en-us/library/aa216840(SQL.80).aspx
The Clustered Index Scan logical and physical operator scans the clustered index specified in the Argument column. When an optional WHERE:() predicate is present, only those rows that satisfy the predicate are returned. If the Argument column contains the ORDERED clause, the query processor has requested that the rows' output be returned in the order in which the clustered index has sorted them. If the ORDERED clause is not present, the storage engine will scan the index in the optimal way (not guaranteeing the output to be sorted).
Clustered Index Scan 逻辑和物理运算符扫描 Argument 列中指定的聚集索引。当存在可选的 WHERE:() 谓词时,仅返回满足谓词的那些行。如果 Argument 列包含 ORDERED 子句,则查询处理器已请求按聚集索引对它们进行排序的顺序返回行的输出。如果 ORDERED 子句不存在,则存储引擎将以最佳方式扫描索引(不保证对输出进行排序)。
http://msdn.microsoft.com/en-us/library/aa178416(SQL.80).aspx
http://msdn.microsoft.com/en-us/library/aa178416(SQL.80).aspx
The Table Scan logical and physical operator retrieves all rows from the table specified in the Argument column. If a WHERE:() predicate appears in the Argument column, only those rows that satisfy the predicate are returned.
表扫描逻辑和物理运算符从参数列中指定的表中检索所有行。如果 WHERE:() 谓词出现在 Argument 列中,则仅返回满足谓词的那些行。
回答by DrPizza
A table scan has to examine every single row of the table. The clustered index scan only needs to scan the index. It doesn't scan every record in the table. That's the point, really, of indices.
表扫描必须检查表的每一行。聚集索引扫描只需要扫描索引即可。它不会扫描表中的每条记录。这就是指数的重点。