SQL 什么是索引,非聚集索引可以是非唯一的吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3800918/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
what is index and can non-clustered index be non-unique?
提问by Gennady Vanin Геннадий Ванин
Subquestion to my question [1]:
我的问题 [1] 的子问题:
All definitions of (MS SQL Server) index (that I could find) are ambiguous and all explanations, based on it, narrate something using undefined or ambiguously defined terms.
What is the definition of index?
(MS SQL Server) 索引(我能找到的)的所有定义都是模棱两可的,所有基于它的解释都使用未定义或定义不明确的术语来叙述某些内容。
index的解释是什么?
For ex., the most common definition of index from wiki (http://en.wikipedia.org/wiki/Index_(database)) :
例如,wiki 中最常见的索引定义(http://en.wikipedia.org/wiki/Index_(database)):
- 1) "A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space. Indexes can be created using one or more columns of a database table..."
- 2) "SQL server creates a clustered index on a primary key by default[1]. The data is present in random order, but the logical ordering is specified by the index. The data rows may be randomly spread throughout the table. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the page and the row number in the data page"
- 1)“数据库索引是一种数据结构,它提高了对数据库表的数据检索操作的速度,代价是写入速度变慢和存储空间增加。可以使用数据库表的一列或多列创建索引......”
- 2)》SQL Server默认在主键上创建聚集索引[1]。数据以随机顺序出现,但逻辑顺序由索引指定。数据行可能随机分布在整个表中。非- 聚集索引树包含按排序顺序的索引键,索引的叶级包含指向页的指针和数据页中的行号"
Well, it is ambiguous. One can understand under index:
嗯,这是模棱两可的。在索引下可以理解:
- 1) an ordered data structure, a tree, containing intermediate and leaf nodes;
- 2) leaf node data containing values from indexed columns + "pointer to the page and the row number in the data page"
- 1) 一个有序的数据结构,一棵树,包含中间节点和叶子节点;
- 2) 包含来自索引列的值的叶节点数据+“指向页面的指针和数据页面中的行号”
Can non-clustered index be non-unique, considering 2)? or, even, 1) ?
It doesn't seem so to me ...
考虑到 2),非聚集索引可以是非唯一的吗?或者,甚至,1)?
在我看来不是这样...
But does TSQL imply existence of non-unique non-clustered index?
但是 TSQL 是否意味着存在非唯一非聚集索引?
If yes, then What is understood by non-clustered index in "CREATE INDEX (Transact-SQL)"[2] and to what the argument UNIQUE is applied there?
如果是,那么“CREATE INDEX (Transact-SQL)”[2] 中的非聚集索引是什么理解以及在那里应用 UNIQUE 参数的内容是什么?
Is it:
是吗:
- 3) leaf node data containing values from indexed columns? i.e. like in 2) but without pointer + row number ) ?
- 3) 包含来自索引列的值的叶节点数据?即像在 2) 但没有指针 + 行号 ) ?
If it is 3), then again question 1) arises - why to apply constraints to copy of real data in "index", instead of real data in-situ?
如果是 3),那么问题 1) 又会出现——为什么要对“索引”中的真实数据的复制应用约束,而不是原位的真实数据?
Update:
Is not bookmark (pointer+row number) to a real data row unique (uniquely identify row)?
Doesn't this bookmark constitute part of the index and thereby makes the index unique?
Can you give me the definition of the index instead of explaining how to use it UNDEFINED? The latter part I already know (or can read myself).
更新:
书签(指针+行号)不是指向真实数据行的唯一(唯一标识行)吗?
这个书签不是索引的一部分,从而使索引唯一吗?
你能给我索引的定义而不是解释如何使用它 UNDEFINED 吗?后一部分我已经知道(或者可以自己阅读)。
[1]
"UNIQUE argument for INDEX creation - what's for?"
UNIQUE argument for INDEX creation - what's for?
[1]
“索引创建的唯一参数 - 是为了什么?”
INDEX 创建的唯一参数 - 是为了什么?
[2]
[CREATE INDEX (Transact-SQL)]
http://msdn.microsoft.com/en-us/library/ms188783.aspx
[2]
[创建索引 (Transact-SQL)]
http://msdn.microsoft.com/en-us/library/ms188783.aspx
回答by marc_s
An index is a data structure designed to optimize querying large data sets. As such, no claim is made about whether or not anything is unique at this point.
索引是一种数据结构,旨在优化查询大数据集。因此,目前没有任何关于任何东西是否独特的声明。
You can definitely have non-unique non-clustered indices - how else could you index on lastname, firstname ?? That's nevergoing to be unique (e.g. on Facebook.....)
你绝对可以有非唯一的非聚集索引——你怎么能在姓氏、名字上建立索引??这永远不会是独一无二的(例如在 Facebook 上......)
You can define an index as being unique - this just adds the extra check to it that no duplicate values are allowed. If you would make your index on (lastname, firstname) UNIQUE, then the second Brad Pitt to sign up on your site couldn't do so, since that unique index would reject his data.
您可以将索引定义为唯一的 - 这只是添加了额外的检查,不允许重复值。如果您要在(姓,名)上建立唯一索引,那么在您的站点上注册的第二个布拉德皮特就不能这样做,因为该唯一索引会拒绝他的数据。
One exception is the primary keyon any given table. The primary key is the logical identifier used to uniquely and precisely identify each single row in your database. As such, it must be unique over all rows and cannot contain any NULL values.
一个例外是任何给定表上的主键。主键是用于唯一且精确地标识数据库中每一行的逻辑标识符。因此,它在所有行中必须是唯一的,并且不能包含任何 NULL 值。
The clustered indexin SQL Server is special in that they do contain the actual data in their leaf nodes. There's no restriction up to this point - however: the clustered index is also being used to uniquely locate (physically locate) the data in your database, and thus, the clustered index mustbe unique - it must be able to tell Brad Pitt #1 and Brad Pitt #2 apart. If you don't take care and provide a unique set of columns to your clustered index, SQL Server will add a "uniquefier" (a 4-byte INT) to those rows that aren't unique, e.g. you'd get BradPitt001 and BradPitt002 (or something like that).
SQL Server 中的聚集索引的特殊之处在于它们确实在其叶节点中包含实际数据。到目前为止没有任何限制 - 但是:聚集索引还用于唯一地定位(物理定位)数据库中的数据,因此,聚集索引必须是唯一的 - 它必须能够告诉 Brad Pitt #1和布拉德皮特 #2 分开。如果您不小心为聚集索引提供一组唯一的列,SQL Server 将向那些不唯一的行添加一个“唯一标识符”(一个 4 字节的 INT),例如您会得到 BradPitt001 和BradPitt002(或类似的东西)。
The clustered index is used as the "pointer" to the actual data row in your SQL Server table, so it's included in every single non-clustered index, too. So your non-clustered, non-unique index on (lastname, firstname) would not only contain these two fields, but in reality, it also contains the clustered keyon that table - that's why it's important the clustered key on a SQL Server table is small, stable, and unique - typically an INT.
聚集索引用作 SQL Server 表中实际数据行的“指针”,因此它也包含在每个非聚集索引中。因此,您对 (lastname, firstname) 的非聚集、非唯一索引不仅包含这两个字段,而且实际上还包含该表上的聚集键- 这就是 SQL Server 表上的聚集键很重要的原因小、稳定且独特 - 通常是 INT。
So your non-clustered index on (lastname, firstname) will really have (lastname, firstname, personID) and will have entries like (Pitt, Brad, 10176)
, (Pitt, Brad, 17665)
and so forth. When you search for "Brad Pitt" in your non-clustered index, SQL Server will now find these two entries, and for both, it has the "physical pointer" to where to find the rest of the data for those two guys, so if you ask for more than just the first- and last name, SQL Server could now go grab the whole row for each of the two Brad Pitt entries and provide you with the data the query requires.
所以,你在非聚集索引(姓氏,名字)才会真正具有(姓氏,名字,PERSONID),并有类似的条目(Pitt, Brad, 10176)
,(Pitt, Brad, 17665)
等等。当您在非聚集索引中搜索“Brad Pitt”时,SQL Server 现在将找到这两个条目,并且对于这两个条目,它都有一个“物理指针”,指向在哪里可以找到这两个人的其余数据,所以如果您要求的不仅仅是名字和姓氏,SQL Server 现在可以为两个 Brad Pitt 条目中的每一个获取整行,并为您提供查询所需的数据。
回答by pascal
The definition of an index is the first part of Wikipedia definition "A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space."
索引的定义是维基百科定义的第一部分“数据库索引是一种数据结构,它以较慢的写入和增加的存储空间为代价提高对数据库表的数据检索操作的速度。”
Then you have unique indexes, as a special kind of index, which ensure that indexed values are unique.
然后你有唯一索引,作为一种特殊的索引,它确保索引值是唯一的。
How it's implemented... depends on the DBMS. But it does not change the definition of index, or unique index.
它是如何实现的……取决于 DBMS。但它不会改变索引或唯一索引的定义。
As an implementation detail, MS SQL allows non-clustered (the usual kind, which is a tree with pointers to the actual row contents in a separate space, which you numbered 2.), and clustered (where rows are stored in the index, according the indexed value, which you numbered 1.) indexes.
作为一个实现细节,MS SQL 允许非聚集(通常的那种,它是一个树,指针指向单独空间中的实际行内容,您编号为 2。)和聚集(其中行存储在索引中,根据索引值,您将其编号为 1.) 索引。
So an non-unique non-clustered index is just (conceptualy) a tree of values with, for each value, a set of pointers to table rows containing this value.
因此,非唯一非聚集索引只是(概念上)一棵值树,对于每个值,都有一组指向包含该值的表行的指针。