SQL 聚集索引必须是唯一的吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4332982/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Do clustered indexes have to be unique?
提问by thomaspaulb
What happens if a clustered index is not unique? Can it lead to bad performance because inserted rows flow to an "overflow" page of some sorts?
如果聚集索引不唯一会发生什么?因为插入的行流向某种“溢出”页面,它会导致性能不佳吗?
Is it "made" unique and if so how? What is the best way to make it unique?
它是“制造”独特的吗?如果是,如何制作?让它独一无二的最佳方法是什么?
I am asking because I am currently using a clustered index to divide my table in logical parts, but the performance is so-so, and recently I got the adviceto make my clustered indexes unique. I'd like a second opinion on that.
我问是因为我目前正在使用聚集索引将我的表划分为逻辑部分,但性能一般,最近我得到了使我的聚集索引唯一的建议。我想就此发表第二意见。
Thanks!
谢谢!
回答by Lieven Keersmaekers
They don't haveto be unique but it certainly is encouraged.
I haven't encountered a scenario yet where I wanted to create a CI on a non-unique column.
他们不具有是唯一的,但可以肯定的是鼓励。
我还没有遇到过要在非唯一列上创建 CI 的情况。
What happens if you create a CI on a non-unique column
如果在非唯一列上创建 CI会发生什么
If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier
如果聚集索引不是唯一索引,SQL Server 通过添加一个内部生成的称为唯一标识符的值来使任何重复的键唯一
Does this lead to bad performance?
这会导致性能不佳吗?
Adding a uniqueifiercertainly adds some overhead in calculating and in storing it.
If this overhead will be noticable depends on several factors.
添加唯一标识符肯定会在计算和存储它时增加一些开销。
这种开销是否显着取决于几个因素。
- How much data the table contains.
- What is the rate of inserts.
- How often is the CI used in a select (when no covering indexes exist, pretty much always).
- 表包含多少数据。
- 插入率是多少。
- 在选择中使用 CI 的频率(当不存在覆盖索引时,几乎总是如此)。
Edit
as been pointed out by Remus in comments, there do exist use cases where creating a non-unique CI would be a reasonable choice. Me not having encountered one off those scenarios merely shows my own lack of exposure or competence (pick your choice).
编辑
在评论所指出瑞摩斯,确实存在使用情况下,创建一个非唯一的CI将是一个合理的选择。我没有遇到过这些场景,这只是表明我自己缺乏曝光或能力(选择你的选择)。
回答by marc_s
I like to check out what The Queen of Indexing, Kimberly Tripp, has to say on the topic:
我想看看索引女王金伯利特里普 (Kimberly Tripp) 对这个话题的看法:
I'm going to start with my recommendation for the Clustering Key - for a couple of reasons. First, it's an easy decision to make and second, making this decision early helps to proactively prevent some types of fragmentation. If you can prevent certain types of base-table fragmentation then you can minimize some maintenance activities (some of which, in SQL Server 2000 AND less of which, in SQL Server 2005) require that your table be offline. OK, I'll get to the rebuild stuff later.....
我将从我对 Clustering Key 的建议开始 - 出于几个原因。首先,这是一个容易做出的决定,其次,尽早做出这个决定有助于主动防止某些类型的碎片化。如果您可以防止某些类型的基表碎片,那么您可以最大限度地减少一些维护活动(其中一些,在 SQL Server 2000 中,其中一些,在 SQL Server 2005 中)要求您的表处于脱机状态。好的,我稍后会谈到重建的东西......
Let's start with the key things that I look for in a clustering key:
让我们从我在集群键中寻找的关键内容开始:
* Unique
* Narrow
* Static
Why Unique?A clustering key should be unique because a clustering key (when one exists) is used as the lookup key from all non-clustered indexes. Take for example an index in the back of a book - if you need to find the data that an index entry points to - that entry (the index entry) must be unique otherwise, which index entry would be the one you're looking for? So, when you create the clustered index - it must be unique. But, SQL Server doesn't require that your clustering key is created on a unique column. You can create it on any column(s) you'd like. Internally, if the clustering key is not unique then SQL Server will “uniquify” it by adding a 4-byte integer to the data. So if the clustered index is created on something which is not unique then not only is there additional overhead at index creation, there's wasted disk space, additional costs on INSERTs and UPDATEs, and in SQL Server 2000, there's an added cost on a clustereD index rebuild (which because of the poor choice for the clustering key is now more likely).
为什么是独一无二的?聚簇键应该是唯一的,因为聚簇键(当存在时)用作所有非聚簇索引的查找键。以书后的索引为例 - 如果您需要找到索引条目指向的数据 - 该条目(索引条目)必须是唯一的,否则,哪个索引条目就是您要查找的条目? 因此,当您创建聚集索引时 - 它必须是唯一的。但是,SQL Server 不要求在唯一列上创建群集键。您可以在您喜欢的任何列上创建它。在内部,如果群集键不是唯一的,那么 SQL Server 将通过向数据添加一个 4 字节整数来“统一”它。因此,如果聚集索引是在不唯一的东西上创建的,那么不仅在创建索引时会有额外的开销,还会浪费磁盘空间,
Source:Ever-increasing clustering key debate - again!
资料来源:不断增加的聚类关键辩论 - 再次!
回答by ChrisW
Do clustered indexes have to be unique?
聚集索引必须是唯一的吗?
They don't, and there are times where it's better if they're not.
他们不这样做,有时如果不这样做会更好。
Consider a table with a semi-random, unique EmployeeId, and a DepartmentId for each employee: if your select statement is
考虑一个具有半随机、唯一 EmployeeId 和每个员工的 DepartmentId 的表:如果您的 select 语句是
SELECT * FROM EmployeeTable WHERE DepartmentId=%DepartmentValue%
then it's best for performance if the DepartmentId
is the clustered index even though (or even especially because) it's not the unique index (best for performance because it ensures that all the records within a given DepartmentId are clustered).
那么,如果DepartmentId
是聚集索引,则性能最好,即使(甚至特别是因为)它不是唯一索引(性能最佳,因为它确保给定 DepartmentId 内的所有记录都聚集在一起)。
Do you have any references?
你有参考资料吗?
There's Clustered Index Design Guidelinesfor example, which says,
例如,有聚集索引设计指南,其中说,
With few exceptions, every table should have a clustered index defined on the column, or columns, that offer the following:
- Can be used for frequently used queries.
- Provide a high degree of uniqueness.
- Can be used in range queries.
除了少数例外,每个表都应该有一个或多个列上定义的聚集索引,这些索引提供以下内容:
- 可用于经常使用的查询。
- 提供高度的独特性。
- 可用于范围查询。
My understanding of "high degree of uniqueness" for example is that it isn't good to choose "Country" as the clusted index if most of your queries want to select the records within a given town.
例如,我对“高度唯一性”的理解是,如果您的大多数查询想要选择给定城镇内的记录,那么选择“国家”作为集群索引是不好的。