SQL 数据库中 ID 字段的 INT 与唯一标识符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1151625/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
INT vs Unique-Identifier for ID field in database
提问by mkchandler
I am creating a new database for a web site using SQL Server 2005 (possibly SQL Server 2008 in the near future). As an application developer, I've seen many databases that use an integer
(or bigint
, etc.) for an ID field of a table that will be used for relationships. But lately I've also seen databases that use the unique identifier
(GUID
) for an ID field.
我正在使用 SQL Server 2005(在不久的将来可能会使用 SQL Server 2008)为网站创建一个新数据库。作为应用程序开发人员,我见过许多数据库使用integer
(或bigint
等)作为将用于关系的表的 ID 字段。但是最近我也看到了使用unique identifier
( GUID
) 作为 ID 字段的数据库。
My question is whether one has an advantage over the other? Will integer
fields be faster for querying and joining, etc.?
我的问题是一个是否比另一个有优势?请问integer
领域更快的查询和加盟等?
UPDATE:To make it clear, this is for a primary key in the tables.
更新:为了清楚起见,这是针对表中的主键。
回答by Remus Rusanu
GUIDs are problematic as clustered keys because of the high randomness. This issue was addressed by Paul Randal in the last Technet Magazine Q&A column: I'd like to use a GUID as the clustered index key, but the others are arguing that it can lead to performance issues with indexes. Is this true and, if so, can you explain why?
由于高随机性,GUID 作为集群键存在问题。Paul Randal 在上一期 Technet 杂志问答专栏中解决了这个问题:我想使用 GUID 作为聚集索引键,但其他人认为它会导致索引的性能问题。这是真的吗,如果是,你能解释一下原因吗?
Now bear in mind that the discussion is specifically about clusteredindexes. You say you want to use the column as 'ID', that is unclear if you mean it as clustered key or just primary key. Typically the two overlap, so I'll assume you want to use it as clustered index. The reasons why that is a poor choice are explained in the link to the article I mentioned above.
现在请记住,讨论是专门针对聚集索引的。您说您想将该列用作“ID”,不清楚您是将其作为聚集键还是只是主键。通常两者重叠,所以我假设您想将其用作聚集索引。我上面提到的文章的链接中解释了为什么这是一个糟糕的选择的原因。
For non clustered indexes GUIDs still have some issues, but not nearly as big as when they are the leftmost clustered key of the table. Again, the randomness of GUIDs introduces page splits and fragmentation, be it at the non-clustered index level only (a much smaller problem).
对于非聚集索引,GUID 仍然存在一些问题,但没有它们作为表最左边的聚集键时那么大。同样,GUID 的随机性引入了页面拆分和碎片,仅在非聚集索引级别(一个小得多的问题)。
There are many urban legends surrounding the GUID usage that condemn them based on their size (16 bytes) compared to an int (4 bytes) and promise horrible performance doom if they are used. This is slightly exaggerated. A key of size 16 can be a very peformant key still, on a properly designed data model. While is true that being 4 times as big as a int results in more a lower density non-leaf pagesin indexes, this is not a real concern for the vast majority of tables. The b-tree structure is a naturally well balanced tree and the depthof tree traversal is seldom an issue, so seeking a value based on GUID key as opposed to a INT key is similar in performance. A leaf-page traversal (ie. a table scan) does not look at the non-leaf pages, and the impact of GUID size on the page size is typically quite small, as the record itself is significantly larger than the extra 12 bytes introduced by the GUID. So I'd take the hear-say advice based on 'is 16 bytes vs. 4' with a, rather large, grain of salt. Analyze on individual case by case and decide if the size impact makes a real difference: how many othercolumns are in the table (ie. how much impact has the GUID size on the leaf pages) and how many references are using it (ie. how many othertables will increase because of the fact they need to store a larger foreign key).
有许多关于 GUID 使用的都市传说根据它们的大小(16 字节)与 int(4 字节)相比谴责它们,并承诺如果使用它们会带来可怕的性能厄运。这有点夸张。在正确设计的数据模型上,大小为 16 的密钥仍然可以是非常出色的密钥。虽然比 int 大 4 倍会导致索引中的非叶页密度更低,但这对于绝大多数表来说并不是真正的问题。b 树结构是一个自然平衡的树,深度树遍历很少是一个问题,因此基于 GUID 键而不是 INT 键来寻找一个值在性能上是相似的。叶页遍历(即表扫描)不会查看非叶页,GUID 大小对页大小的影响通常非常小,因为记录本身比引入的额外 12 个字节要大得多通过 GUID。所以我会接受基于“是 16 个字节 vs. 4 个字节”的传闻建议,其中包含相当大的盐粒。逐案分析并确定大小影响是否真正产生影响:表中有多少其他列(即 GUID 大小对叶页有多大影响)以及有多少引用正在使用它(即。由于需要存储更大的外键,因此会增加多少其他表)。
I'm calling out all these details in a sort of makeshift defense of GUIDs because they been getting a lot of bad press lately and some is undeserved. They have their merits and are indispensable in any distributed system (the moment you're talking data movement, be it via replication or sync framework or whatever). I've seen bad decisions being made out based on the GUID bad reputation when they were shun without proper consideration. But is true, if you have to use a GUID as clustered key, make sure you address the randomness issue: use sequential guidswhen possible.
我在对 GUID 的一种临时防御中提出所有这些细节,因为它们最近受到了很多负面报道,有些是不应该的。它们有其优点,并且在任何分布式系统中都是不可或缺的(当您谈论数据移动时,无论是通过复制还是同步框架或其他方式)。我见过基于 GUID 的坏名声做出的错误决定,因为他们在没有适当考虑的情况下被回避。但确实如此,如果您必须使用 GUID 作为集群键,请确保解决随机性问题:尽可能使用顺序 guid。
And finally, to answer your question: if you don't have a specificreason to use GUIDs, use INTs.
最后,回答您的问题:如果您没有使用 GUID的特定理由,请使用 INT。
回答by JBrooks
The GUID is going to take up more space and be slower than an int - even if you use the newsequentialid() function. If you are going to do replication or use the sync framework you pretty much have to use a guid.
GUID 将占用更多空间并且比 int 慢 - 即使您使用 newsequentialid() 函数。如果您要进行复制或使用同步框架,则几乎必须使用 guid。
回答by Philip Kelley
INTs are 4 bytes, BIGINTs ar 8 bytes, and GUIDS are 16 bytes. The more space required to represent the data, the more resources required to process it -- disk space, memory, etc. So (a) they're slower, but (b) this probably only matters if volume is an issue (millions of rows, or thousands of transactions in very, very little time.)
INT 为 4 个字节,BIGINT 为 8 个字节,GUIDS 为 16 个字节。表示数据所需的空间越多,处理它所需的资源就越多——磁盘空间、内存等。所以 (a) 它们更慢,但 (b) 这可能只有在体积是一个问题时才重要(数百万行,或在很短的时间内完成数千笔交易。)
The advantage of GUIDs is that they are (pretty much) Globally Unique. Generate a guid using the proper algorithm (and SQL Server xxxx will use the proper algorithm), and no two guids will ever be alike--no matter how many computers you have generating them, no matter how frequently. (This does not apply after 72 years of usage--I forget the details.)
GUID 的优势在于它们(几乎)全球唯一。使用正确的算法生成 guid(并且 SQL Server xxxx 将使用正确的算法),并且没有两个 guid 是相同的——无论您有多少台计算机,无论生成它们的频率如何。(这在使用 72 年后不再适用——我忘记了细节。)
If you need unique identifiers generated across multiple servers, GUIDs may be useful. If you need mondo perforance and under 2 billion values, ints are probably fine. Lastly and perhaps most importantly, if your data has natural keys, stick with them and forget the surrogate values.
如果您需要跨多个服务器生成唯一标识符,GUID 可能很有用。如果您需要 mondo perforance 和低于 20 亿的值,整数可能没问题。最后也是最重要的是,如果您的数据具有自然键,请坚持使用它们并忘记代理值。
回答by Hyman Marchetti
if you positively, absolutely have to have a unique ID, then GUID. Meaning if you're ever gonna merge, sync, replicate, you probably should use a GUID.
如果你肯定,绝对必须有一个唯一的ID,然后GUID。这意味着如果您要合并、同步、复制,您可能应该使用 GUID。
For less robust things, an int, should suffice depending upon how large the table will grow.
对于不太健壮的东西,一个 int 应该就足够了,这取决于表的增长大小。
As in most cases, the proper answer is, it depends.
在大多数情况下,正确的答案是,视情况而定。
回答by gbn
Use them for replication etc, notas primary keys.
将它们用于复制等,而不是用作主键。
- Against: Space, not strictly monotonic, page splits, bookmark/RIDs etc
- For: er...
- 反对:空格、不严格单调、页面拆分、书签/RID 等
- 对于:呃...
回答by Alex_L
Fully agreed with JBrooks. I want to say that when your table is large, and you use selects with JOINS, especially with derived tables, using GUIDs can significally decrease performance.
完全同意 JBrooks 的观点。我想说的是,当您的表很大,并且您使用带有 JOINS 的选择时,尤其是对于派生表,使用 GUID 会显着降低性能。