database GUID/UUID 数据库键的优缺点
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45399/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Advantages and disadvantages of GUID / UUID database keys
提问by Matt Sheppard
I've worked on a number of database systems in the past where moving entries between databases would have been made a lot easier if all the database keys had been GUID / UUIDvalues. I've considered going down this path a few times, but there's always a bit of uncertainty, especially around performance and un-read-out-over-the-phone-able URLs.
过去我曾在许多数据库系统上工作过,如果所有数据库键都是GUID/UUID值,那么在数据库之间移动条目会变得更加容易。我已经考虑过几次沿着这条路走下去,但总是有一些不确定性,尤其是在性能和未通过电话读取的 URL 方面。
Has anyone worked extensively with GUIDs in a database? What advantages would I get by going that way, and what are the likely pitfalls?
有没有人在数据库中广泛使用 GUID?这样做有什么好处,可能的陷阱是什么?
回答by Nic Wise
Advantages:
好处:
- Can generate them offline.
- Makes replication trivial (as opposed to int's, which makes it REALLY hard)
- ORM's usually like them
- Unique across applications. So We can use the PK's from our CMS (guid) in our app (also guid) and know we are NEVER going to get a clash.
- 可以离线生成它们。
- 使复制变得微不足道(与 int 不同,这使它变得非常困难)
- ORM 通常喜欢它们
- 跨应用程序唯一。所以我们可以在我们的应用程序(也是 guid)中使用来自我们的 CMS(guid)的 PK,并且知道我们永远不会发生冲突。
Disadvantages:
缺点:
- Larger space use, but space is cheap(er)
- Can't order by ID to get the insert order.
- Can look ugly in a URL, but really, WTF are you doing putting a REAL DB key in a URL!? (This point disputed in comments below)
- Harder to do manual debugging, but not that hard.
- 更大的空间使用,但空间便宜(呃)
- 无法按 ID 订购以获取插入顺序。
- 在 URL 中可能看起来很难看,但实际上,WTF 您是否正在将 REAL DB 键放入 URL 中!?(这一点在下面的评论中有争议)
- 手动调试比较难,但也没有那么难。
Personally, I use them for most PK's in any system of a decent size, but I got "trained" on a system which was replicated all over the place, so we HAD to have them. YMMV.
就我个人而言,我将它们用于任何大小合适的系统中的大多数 PK,但我在一个到处复制的系统上进行了“培训”,因此我们必须拥有它们。天啊。
I think the duplicate data thing is rubbish - you can get duplicate data however you do it. Surrogate keys are usually frowned upon where ever I've been working. We DO use the WordPress-like system though:
我认为重复数据的事情是垃圾 - 无论你怎么做,你都可以获得重复的数据。在我工作过的地方,代理键通常不受欢迎。不过,我们确实使用了类似 WordPress 的系统:
- unique ID for the row (GUID/whatever). Never visible to the user.
- public ID is generated ONCE from some field (e.g. the title - make it the-title-of-the-article)
- 行的唯一 ID(GUID/任何)。永远不会对用户可见。
- 公共 ID 从某个字段生成一次(例如标题 - 使其成为文章的标题)
UPDATE:So this one gets +1'ed a lot, and I thought I should point out a big downside of GUID PK's: Clustered Indexes.
更新:所以这个得到了很多 +1,我想我应该指出 GUID PK 的一个很大的缺点:聚集索引。
If you have a lot of records, and a clustered index on a GUID, your insert performance will SUCK, as you get inserts in random places in the list of items (thats the point), not at the end (which is quick)
如果您有很多记录,并且在 GUID 上有聚集索引,那么您的插入性能会很糟糕,因为您会在项目列表中的随机位置插入(这就是重点),而不是在最后(这很快)
So if you need insert performance, maybe use a auto-inc INT, and generate a GUID if you want to share it with someone else (ie, show it to a user in a URL)
因此,如果您需要插入性能,可以使用 auto-inc INT,并在您想与其他人共享时生成 GUID(即,在 URL 中向用户显示)
回答by Troels Arvin
@Matt Sheppard:
@马特谢泼德:
Say you have a table of customers. Surely you don't want a customer to exist in the table more than once, or lots of confusion will happen throughout your sales and logistics departments (especially if the multiple rows about the customer contain different information).
假设您有一张客户表。当然,您不希望表中存在多个客户,否则整个销售和物流部门都会发生很多混乱(特别是如果关于客户的多行包含不同的信息)。
So you have a customer identifier which uniquely identifies the customer and you make sure that the identifier is known by the customer (in invoices), so that the customer and the customer service people have a common reference in case they need to communicate. To guarantee no duplicated customer records, you add a uniqueness-constraint to the table, either through a primary key on the customer identifier or via a NOT NULL + UNIQUE constraint on the customer identifier column.
因此,您有一个唯一标识客户的客户标识符,并确保客户知道该标识符(在发票中),以便客户和客户服务人员在需要沟通时有一个共同的参考。为了保证没有重复的客户记录,您可以通过客户标识符上的主键或通过客户标识符列上的 NOT NULL + UNIQUE 约束向表中添加唯一性约束。
Next, for some reason (which I can't think of), you are asked to add a GUID column to the customer table and make that the primary key. If the customer identifier column is now left without a uniqueness-guarantee, you are asking for future trouble throughout the organization because the GUIDs will always be unique.
接下来,出于某种原因(我想不出),您需要向客户表中添加一个 GUID 列并将其设为主键。如果客户标识符列现在没有唯一性保证,那么您将在整个组织中寻找未来的麻烦,因为 GUID 将始终是唯一的。
Some "architect" might tell you that "oh, but we handle the realcustomer uniqueness constraint in our app tier!". Right. Fashion regarding that general purpose programming languages and (especially) middle tier frameworks changes all the time, and will generally never out-live your database. And there is a very good chance that you will at some point need to access the database without going through the present application. == Trouble. (But fortunately, you and the "architect" are long gone, so you will not be there to clean up the mess.) In other words: Do maintain obvious constraints in the database (and in other tiers, as well, if you have the time).
某些“架构师”可能会告诉您“哦,但我们在我们的应用程序层中处理了真正的客户唯一性约束!”。对。关于通用编程语言和(尤其是)中间层框架的时尚一直在变化,并且通常永远不会超过您的数据库。并且很有可能您在某个时候需要访问数据库而无需通过当前的应用程序。== 麻烦。(但幸运的是,您和“架构师”早已不在,所以您不会在那里清理混乱。)换句话说:请务必在数据库中(以及其他层,如果您有)保持明显的约束时间)。
In other words: There may be good reasons to add GUID columns to tables, but please don't fall for the temptation to make that lower your ambitions for consistency within the real(==non-GUID) information.
换句话说:可能有很好的理由将 GUID 列添加到表中,但请不要被诱惑而降低您在真实(==非 GUID)信息中保持一致性的野心。
回答by Mendelt
The main advantages are that you can create unique id's without connecting to the database. And id's are globally unique so you can easilly combine data from different databases. These seem like small advantages but have saved me a lot of work in the past.
主要优点是您可以在不连接到数据库的情况下创建唯一 ID。并且 id 是全局唯一的,因此您可以轻松组合来自不同数据库的数据。这些看似很小的优势,但在过去为我节省了很多工作。
The main disadvantages are a bit more storage needed (not a problem on modern systems) and the id's are not really human readable. This can be a problem when debugging.
主要缺点是需要更多的存储空间(在现代系统上不是问题),并且 id 不是真正的人类可读的。这在调试时可能是一个问题。
There are some performance problems like index fragmentation. But those are easilly solvable (comb guids by jimmy nillson: http://www.informit.com/articles/article.aspx?p=25862)
存在一些性能问题,例如索引碎片。但这些很容易解决(吉米尼尔森的梳子指南:http: //www.informit.com/articles/article.aspx?p =25862 )
Editmerged my two answers to this question
编辑合并了我对这个问题的两个答案
@Matt Sheppard I think he means that you can duplicate rows with different GUIDs as primary keys. This is an issue with any kind of surrogate key, not just GUIDs. And like he said it is easilly solved by adding meaningfull unique constraints to non-key columns. The alternative is to use a natural key and those have real problems..
@Matt Sheppard 我认为他的意思是您可以复制具有不同 GUID 作为主键的行。这是任何类型的代理键的问题,而不仅仅是 GUID。就像他说的那样,通过向非键列添加有意义的唯一约束很容易解决。另一种方法是使用自然键,而那些有真正的问题..
回答by Andrei R?nea
Why doesn't anyone mention performance? When you have multiple joins, all based on these nasty GUIDs the performance will go through the floor, been there :(
为什么没有人提到性能?当您有多个连接时,所有这些都基于这些讨厌的 GUID,性能将通过地板,一直在那里:(
回答by Troels Arvin
GUIDs may cause you a lot of trouble in the future if they are used as "uniqifiers", letting duplicated data get into your tables. If you want to use GUIDs, please consider still maintaining UNIQUE-constraints on other column(s).
如果将 GUID 用作“唯一标识符”,让重复的数据进入您的表,那么将来它们可能会给您带来很多麻烦。如果您想使用 GUID,请考虑在其他列上仍然保持 UNIQUE 约束。
回答by WIDBA
One other small issue to consider with using GUIDS as primary keys if you are also using that column as a clustered index (a relatively common practice). You are going to take a hit on insert because of the nature of a guid not begin sequential in anyway, thus their will be page splits, etc when you insert. Just something to consider if the system is going to have high IO...
如果您还将该列用作聚集索引(一种相对常见的做法),则使用 GUIDS 作为主键要考虑的另一个小问题。由于 guid 的性质无论如何都不会按顺序开始,因此您将在插入时遇到问题,因此在您插入时它们将是页面拆分等。如果系统将具有高 IO,则需要考虑一些问题...
回答by wener
The Cost of GUIDs as Primary Keys(SQL Server 2000)
GUID 作为主键的成本(SQL Server 2000)
Myths, GUID vs. Autoincrement(MySQL 5)
神话、GUID 与自动增量(MySQL 5)
This is realy what you want.
这真的是你想要的。
UID Pros
UID 优点
- Unique across every table, every database, every server
- Allows easy merging of records from different databases
- Allows easy distribution of databases across multiple servers
- You can generate IDs anywhere, instead of having to roundtrip to the database
- Most replication scenarios require GUID columns anyway
- 每个表、每个数据库、每个服务器都是唯一的
- 允许轻松合并来自不同数据库的记录
- 允许跨多个服务器轻松分发数据库
- 您可以在任何地方生成 ID,而不必往返数据库
- 大多数复制方案无论如何都需要 GUID 列
GUID Cons
GUID 缺点
- It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
- Cumbersome to debug (where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
- The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes
- 它比传统的 4 字节索引值大 4 倍;如果您不小心,这可能会对性能和存储产生严重影响
- 调试麻烦(其中 userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
- 生成的 GUID 应该是部分顺序的以获得最佳性能(例如,SQL 2005 上的 newsequentialid())并启用聚集索引
回答by Antti Haapala
There is one thing that is not really addressed, namely using random(UUIDv4) IDs as primary keys will harm the performance of the primary key index. It will happen whether or not your table is clustered around the key.
有一件事没有真正解决,即使用随机(UUIDv4) ID 作为主键会损害主键索引的性能。无论您的表是否围绕键聚集,都会发生这种情况。
RDBMs usually ensure the uniqueness of the primary keys, and ensure the lookups by a key, in a structure called BTree, which is a search tree with a large branching factor (a binary search tree has branching factor of 2). Now, a sequential integer ID would cause the inserts to occur just oneside of the tree, leaving most of the leaf nodes untouched. Adding random UUIDs will cause the insertions to split leaf nodes all over the index.
RDBMs 通常保证主键的唯一性,并通过一个键来保证查找,在一个叫做 BTree 的结构中,它是一个具有大分支因子的搜索树(二叉搜索树的分支因子为 2)。现在,顺序整数ID将导致刀片出现只是一个树的一侧,剩下的大部分叶节点不变。添加随机 UUID 将导致插入分裂整个索引的叶节点。
Likewise if the data stored is mostly temporal, it is often the case that the most recent data needs to be accessed and joined against the most. With random UUIDs the patterns will not benefit from this, and will hit more index rows, thereby needing more of the index pages in memory. With sequential IDs if the most-recent data is needed the most, the hot index pages would require less RAM.
同样,如果存储的数据主要是临时的,通常情况下需要访问最新的数据并结合最多的数据。使用随机 UUID,模式将不会从中受益,并且会命中更多索引行,从而需要更多内存中的索引页。如果使用顺序 ID,如果最需要最近的数据,则热索引页面将需要较少的 RAM。