SQL 表扫描和聚集索引扫描有什么区别?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18764/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 23:07:22  来源:igfitidea点击:

What's the difference between a Table Scan and a Clustered Index Scan?

sqlsql-serverindexing

提问by Seibar

Since both a Table Scanand a Clustered Index Scanessentially scan all records in the table, why is a Clustered Index Scan supposedly better?

由于 aTable Scan和 aClustered Index Scan本质上都扫描表中的所有记录,为什么聚集索引扫描应该更好?

As an example - what's the performance difference between the following when there are many records?:

例如 - 当有很多记录时,以下之间的性能差异是什么?:

declare @temp table(
    SomeColumn varchar(50)
)

insert into @temp
select 'SomeVal'

select * from @temp

-----------------------------

declare @temp table(
    RowID int not null identity(1,1) primary key,
    SomeColumn varchar(50)
)

insert into @temp
select 'SomeVal'

select * from @temp

回答by Mark Brackett

In a table without a clustered index (a heap table), data pages are not linked together - so traversing pages requires a lookup into the Index Allocation Map.

在没有聚集索引的表(堆表)中,数据页没有链接在一起 - 因此遍历页需要查找索引分配映射

A clustered table, however, has it's data pages linked in a doubly linked list- making sequential scans a bit faster. Of course, in exchange, you have the overhead of dealing with keeping the data pages in order on INSERT, UPDATE, and DELETE. A heap table, however, requires a second write to the IAM.

但是,聚簇表的数据页链接在双向链表中- 使顺序扫描速度更快。当然,作为交换,你必须处理保持数据页顺序的开销INSERTUPDATEDELETE。但是,堆表需要对 IAM 进行第二次写入。

If your query has a RANGEoperator (e.g.: SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering.

如果您的查询具有RANGE运算符(例如SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100:),那么聚簇表(按保证顺序排列)会更有效 - 因为它可以使用索引页来查找相关数据页。堆必须扫描所有行,因为它不能依赖于排序。

And, of course, a clustered index lets you do a CLUSTERED INDEX SEEK, which is pretty much optimal for performance...a heap with no indexes would always result in a table scan.

而且,当然,聚集索引可以让您执行 CLUSTERED INDEX SEEK,这对于性能来说非常理想……没有索引的堆总是会导致表扫描。

So:

所以:

  • For your example query where you select all rows, the only difference is the doubly linked list a clustered index maintains. This should make your clustered table just a tiny bit faster than a heap with a large number of rows.

  • For a query with a WHEREclause that can be (at least partially) satisfied by the clustered index, you'll come out ahead because of the ordering - so you won't have to scan the entire table.

  • For a query that is not satisified by the clustered index, you're pretty much even...again, the only difference being that doubly linked list for sequential scanning. In either case, you're suboptimal.

  • For INSERT, UPDATE, and DELETEa heap may or may not win. The heap doesn't have to maintain order, but does require a second write to the IAM. I think the relative performance difference would be negligible, but also pretty data dependent.

  • 对于您选择所有行的示例查询,唯一的区别是聚集索引维护的双向链表。这应该使您的聚簇表比具有大量行的堆快一点点。

  • 对于带有WHERE可以(至少部分)由聚集索引满足的子句的查询,由于排序,您将领先 - 因此您不必扫描整个表。

  • 对于聚簇索引不满足的查询,您几乎甚至......同样,唯一的区别是用于顺序扫描的双向链表。在任何一种情况下,你都是次优的。

  • 对于INSERTUPDATEDELETE堆可能会也可能不会赢。堆不必维护顺序,但确实需要对 IAM 进行第二次写入。我认为相对性能差异可以忽略不计,但也非常依赖于数据。

Microsoft has a whitepaperwhich compares a clustered index to an equivalent non-clustered index on a heap (not exactly the same as I discussed above, but close). Their conclusion is basically to put a clustered index on all tables. I'll do my best to summarize their results (again, note that they're really comparing a non-clustered index to a clustered index here - but I think it's relatively comparable):

Microsoft 有一份白皮书将聚集索引与堆上的等效非聚集索引进行比较(与我上面讨论的不完全相同,但很接近)。他们的结论基本上是在所有表上放置聚集索引。我会尽力总结他们的结果(再次注意,他们实际上是在比较非聚集索引和聚集索引——但我认为它是相对可比的):

  • INSERTperformance: clustered index wins by about 3% due to the second write needed for a heap.
  • UPDATEperformance: clustered index wins by about 8% due to the second lookup needed for a heap.
  • DELETEperformance: clustered index wins by about 18% due to the second lookup needed and the second delete needed from the IAM for a heap.
  • single SELECTperformance: clustered index wins by about 16% due to the second lookup needed for a heap.
  • range SELECTperformance: clustered index wins by about 29% due to the random ordering for a heap.
  • concurrent INSERT: heap table wins by 30% under load due to page splits for the clustered index.
  • INSERT性能:由于堆需要第二次写入,聚集索引的优势大约为 3%。
  • UPDATE性能:由于堆需要第二次查找,聚集索引的优势大约为 8%。
  • DELETE性能:由于需要第二次查找以及需要从 IAM 中对堆进行第二次删除,聚集索引的优势大约为 18%。
  • 单一SELECT性能:由于堆需要第二次查找,聚集索引的优势大约为 16%。
  • 范围SELECT性能:由于堆的随机排序,聚集索引胜出约 29%。
  • concurrent INSERT:由于聚集索引的页面拆分,堆表在负载下胜出 30%。

回答by Stu

http://msdn.microsoft.com/en-us/library/aa216840(SQL.80).aspx

http://msdn.microsoft.com/en-us/library/aa216840(SQL.80).aspx

The Clustered Index Scan logical and physical operator scans the clustered index specified in the Argument column. When an optional WHERE:() predicate is present, only those rows that satisfy the predicate are returned. If the Argument column contains the ORDERED clause, the query processor has requested that the rows' output be returned in the order in which the clustered index has sorted them. If the ORDERED clause is not present, the storage engine will scan the index in the optimal way (not guaranteeing the output to be sorted).

Clustered Index Scan 逻辑和物理运算符扫描 Argument 列中指定的聚集索引。当存在可选的 WHERE:() 谓词时,仅返回满足谓词的那些行。如果 Argument 列包含 ORDERED 子句,则查询处理器已请求按聚集索引对它们进行排序的顺序返回行的输出。如果 ORDERED 子句不存在,则存储引擎将以最佳方式扫描索引(不保证对输出进行排序)。

http://msdn.microsoft.com/en-us/library/aa178416(SQL.80).aspx

http://msdn.microsoft.com/en-us/library/aa178416(SQL.80).aspx

The Table Scan logical and physical operator retrieves all rows from the table specified in the Argument column. If a WHERE:() predicate appears in the Argument column, only those rows that satisfy the predicate are returned.

表扫描逻辑和物理运算符从参数列中指定的表中检索所有行。如果 WHERE:() 谓词出现在 Argument 列中,则仅返回满足谓词的那些行。

回答by DrPizza

A table scan has to examine every single row of the table. The clustered index scan only needs to scan the index. It doesn't scan every record in the table. That's the point, really, of indices.

表扫描必须检查表的每一行。聚集索引扫描只需要扫描索引即可。它不会扫描表中的每条记录。这就是指数的重点。