database 代理与自然/业务键
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/63090/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Surrogate vs. natural/business keys
提问by Manrico Corazzi
Here we go again, the old argument still arises...
我们又来了,旧的争论仍然出现......
Would we better have a business key as a primary key, or would we rather have a surrogate id (i.e. an SQL Server identity) with a unique constraint on the business key field?
我们是否最好将业务键作为主键,或者我们宁愿拥有一个对业务键字段具有唯一约束的代理 ID(即 SQL Server 标识)?
Please, provide examples or proof to support your theory.
请提供例子或证据来支持你的理论。
采纳答案by Ted
Both. Have your cake and eat it.
两个都。吃你的蛋糕。
Remember there is nothing special about a primary key, except that it is labelled as such. It is nothing more than a NOT NULL UNIQUE constraint, and a table can have more than one.
请记住,主键没有什么特别之处,只是它被标记为这样。它只不过是一个 NOT NULL UNIQUE 约束,一个表可以有多个。
If you use a surrogate key, you still want a business key to ensure uniqueness according to the business rules.
如果您使用代理键,您仍然需要一个业务键来确保根据业务规则的唯一性。
回答by Jay Shepherd
Just a few reasons for using surrogate keys:
使用代理键的几个原因:
Stability: Changing a key because of a business or natural need will negatively affect related tables. Surrogate keys rarely, if ever, need to be changed because there is no meaning tied to the value.
Convention: Allows you to have a standardized Primary Key column naming convention rather than having to think about how to join tables with various names for their PKs.
Speed: Depending on the PK value and type, a surrogate key of an integer may be smaller, faster to index and search.
稳定性:由于业务或自然需要而更改键将对相关表产生负面影响。代理键很少(如果有的话)需要更改,因为与值无关。
约定:允许您拥有标准化的主键列命名约定,而不必考虑如何将具有各种名称的表连接起来作为其 PK。
速度:根据 PK 值和类型,整数的代理键可能更小,索引和搜索速度更快。
回答by Tony Andrews
It appears that no one has yet said anything in support of non-surrogate (I hesitate to say "natural") keys. So here goes...
似乎还没有人说过任何支持非代理(我不愿说“自然”)键的内容。所以这里...
A disadvantageof surrogate keys is that they are meaningless(cited as an advantage by some, but...). This sometimes forces you to join a lot more tables into your query than should really be necessary. Compare:
代理键的一个缺点是它们毫无意义(有些人认为这是一个优势,但是......)。这有时会迫使您在查询中加入比实际需要更多的表。相比:
select sum(t.hours)
from timesheets t
where t.dept_code = 'HR'
and t.status = 'VALID'
and t.project_code = 'MYPROJECT'
and t.task = 'BUILD';
against:
反对:
select sum(t.hours)
from timesheets t
join departents d on d.dept_id = t.dept_id
join timesheet_statuses s on s.status_id = t.status_id
join projects p on p.project_id = t.project_id
join tasks k on k.task_id = t.task_id
where d.dept_code = 'HR'
and s.status = 'VALID'
and p.project_code = 'MYPROJECT'
and k.task_code = 'BUILD';
Unless anyone seriously thinks the following is a good idea?:
除非有人认真地认为以下是个好主意?:
select sum(t.hours)
from timesheets t
where t.dept_id = 34394
and t.status_id = 89
and t.project_id = 1253
and t.task_id = 77;
"But" someone will say, "what happens when the code for MYPROJECT or VALID or HR changes?" To which my answer would be: "why would you needto change it?" These aren't "natural" keys in the sense that some outside body is going to legislate that henceforth 'VALID' should be re-coded as 'GOOD'. Only a small percentage of "natural" keys really fall into that category - SSN and Zip code being the usual examples. I would definitely use a meaningless numeric key for tables like Person, Address - but not for everything, which for some reason most people here seem to advocate.
“但是”有人会说,“当 MYPROJECT 或 VALID 或 HR 的代码发生变化时会发生什么?” 我的回答是:“你为什么需要改变它?” 这些不是“自然”键,因为某些外部机构将立法规定此后“VALID”应重新编码为“GOOD”。只有一小部分“自然”密钥真正属于该类别——SSN 和邮政编码是通常的例子。对于 Person、Address 等表,我肯定会使用无意义的数字键,但不会对所有表使用无意义的数字键,但出于某种原因,这里的大多数人似乎都提倡这样做。
See also: my answer to another question
另见:我对另一个问题的回答
回答by tzot
Surrogate keys (typically integers) have the added-value of making your table relations faster, and more economic in storage and update speed (even better, foreign keys do not need to be updated when using surrogate keys, in contrast with business key fields, that do change now and then).
代理键(通常是整数)具有使您的表关系更快、更经济的存储和更新速度的附加值(更好的是,与业务键字段相比,使用代理键时不需要更新外键,时不时会改变)。
A table's primary key should be used for identifying uniquely the row, mainly for join purposes. Think a Persons table: names can change, and they're not guaranteed unique.
表的主键应该用于唯一标识行,主要用于连接目的。想想 Persons 表:名称可以更改,并且不能保证它们是唯一的。
Think Companies: you're a happy Merkin company doing business with other companies in Merkia. You are clever enough not to use the company name as the primary key, so you use Merkia's government's unique company ID in its entirety of 10 alphanumeric characters. Then Merkia changes the company IDs because they thought it would be a good idea. It's ok, you use your db engine's cascaded updates feature, for a change that shouldn't involve you in the first place. Later on, your business expands, and now you work with a company in Freedonia. Freedonian company id are up to 16 characters. You need to enlarge the company id primary key (also the foreign key fields in Orders, Issues, MoneyTransfers etc), adding a Country field in the primary key (also in the foreign keys). Ouch! Civil war in Freedonia, it's split in three countries. The country name of your associate should be changed to the new one; cascaded updates to the rescue. BTW, what's your primary key? (Country, CompanyID) or (CompanyID, Country)? The latter helps joins, the former avoids another index (or perhaps many, should you want your Orders grouped by country too).
Think Companies:您是一家快乐的 Merkin 公司,与 Merkia 的其他公司开展业务。您足够聪明,没有使用公司名称作为主键,因此您使用了 Merkia 政府的唯一公司 ID,整个 ID 由 10 个字母数字字符组成。然后 Merkia 更改了公司 ID,因为他们认为这是个好主意。没关系,您可以使用数据库引擎的级联更新功能来进行最初不应该涉及您的更改。后来,您的业务扩大了,现在您与弗里多尼亚的一家公司合作。Freedonian 公司 ID 最多 16 个字符。您需要扩大公司 ID 主键(还有 Orders、Issues、MoneyTransfers 等中的外键字段),在主键(也在外键中)添加 Country 字段。哎哟! 弗里多尼亚的内战,它' s 分裂在三个国家。你的同事的国名应该改成新的;级联更新以进行救援。顺便说一句,你的主键是什么?(国家,公司ID)还是(公司ID,国家)?后者有助于连接,前者避免了另一个索引(或者可能很多,如果您也希望您的订单按国家/地区分组)。
All these are not proof, but an indication that a surrogate key to uniquely identify a row for all uses, including join operations, is preferable to a business key.
所有这些都不是证据,而是表明用于唯一标识所有用途(包括连接操作)的行的代理键比业务键更可取。
回答by Rimantas
Surrogate key will NEVER have a reason to change. I cannot say the same about the natural keys. Last names, emails, ISBN nubmers - they all can change one day.
代理键永远没有改变的理由。我不能对自然键说同样的话。姓氏、电子邮件、ISBN 编号——它们都可能有一天会改变。
回答by Ken
I hate surrogate keys in general. They should only be used when there is no quality natural key available. It is rather absurd when you think about it, to think that adding meaningless data to your table could make things better.
我一般讨厌代理键。只有在没有可用的质量自然键时才应使用它们。当您想到它时,认为将无意义的数据添加到您的表可以使事情变得更好的想法是相当荒谬的。
Here are my reasons:
以下是我的理由:
When using natural keys, tables are clustered in the way that they are most often searched thus making queries faster.
When using surrogate keys you must add unique indexes on logical key columns. You still need to prevent logical duplicate data. For example, you can't allow two Organizations with the same name in your Organization table even though the pk is a surrogate id column.
When surrogate keys are used as the primary key it is much less clear what the natural primary keys are. When developing you want to know what set of columns make the table unique.
In one to many relationship chains, the logical key chains. So for example, Organizations have many Accounts and Accounts have many Invoices. So the logical-key of Organization is OrgName. The logical-key of Accounts is OrgName, AccountID. The logical-key of Invoice is OrgName, AccountID, InvoiceNumber.
When surrogate keys are used, the key chains are truncated by only having a foreign key to the immediate parent. For example, the Invoice table does not have an OrgName column. It only has a column for the AccountID. If you want to search for invoices for a given organization, then you will need to join the Organization, Account, and Invoice tables. If you use logical keys, then you could Query the Organization table directly.
Storing surrogate key values of lookup tables causes tables to be filled with meaningless integers. To view the data, complex views must be created that join to all of the lookup tables. A lookup table is meant to hold a set of acceptable values for a column. It should not be codified by storing an integer surrogate key instead. There is nothing in the normalization rules that suggest that you should store a surrogate integer instead of the value itself.
I have three different database books. Not one of them shows using surrogate keys.
使用自然键时,表会按照最常被搜索的方式进行聚类,从而加快查询速度。
使用代理键时,您必须在逻辑键列上添加唯一索引。您仍然需要防止逻辑重复数据。例如,即使 pk 是代理 id 列,您也不能在您的组织表中允许两个具有相同名称的组织。
当代理键用作主键时,自然主键是什么就不太清楚了。在开发时,您想知道哪些列集使表独一无二。
在一对多关系链中,逻辑密钥链。例如,组织有许多帐户,而帐户有许多发票。所以Organization 的逻辑键是OrgName。Accounts 的逻辑键是 OrgName、AccountID。Invoice 的逻辑键是 OrgName、AccountID、InvoiceNumber。
当使用代理键时,键链会被截断,因为只有直接父键的外键。例如,发票表没有 OrgName 列。它只有一个 AccountID 列。如果要搜索给定组织的发票,则需要加入 Organization、Account 和 Invoice 表。如果使用逻辑键,则可以直接查询组织表。
存储查找表的代理键值会导致表被无意义的整数填充。要查看数据,必须创建连接到所有查找表的复杂视图。查找表旨在为列保存一组可接受的值。它不应该通过存储整数代理键来编码。规范化规则中没有任何内容建议您应该存储代理整数而不是值本身。
我有三本不同的数据库书籍。其中没有一个显示使用代理键。
回答by mwnsiri
I want to share my experience with you on this endless war :D on natural vs surrogate key dilemma. I think that bothsurrogate keys (artificial auto-generated ones) and natural keys (composed of column(s) with domain meaning) have prosand cons. So depending on your situation, it might be more relevant to choose one method or the other.
我想在这场无休止的War中与您分享我的经验 :D 关于自然与代理关键困境。我认为,这两个代理键(人工自动生成的)和自然键(列(S组成)与域的意思)有优点和缺点。因此,根据您的情况,选择一种方法或另一种方法可能更相关。
As it seems that many people present surrogate keys as the almost perfect solution and natural keys as the plague, I will focus on the other point of view's arguments:
由于似乎许多人将代理键视为近乎完美的解决方案而将自然键视为瘟疫,因此我将重点关注其他观点的论点:
Disadvantages of surrogate keys
代理键的缺点
Surrogate keys are:
代理键是:
- Source of performance problems:
- They are usually implemented using auto-incremented columns which mean:
- A round-trip to the database each time you want to get a new Id (I know that this can be improved using caching or [seq]hilo alike algorithms but still those methods have their own drawbacks).
- If one-day you need to move your data from one schema to another (It happens quite regularly in my company at least) then you might encounter Id collision problems. And Yes I know that you can use UUIDs but those lasts requires 32 hexadecimal digits! (If you care about database size then it can be an issue).
- If you are using one sequence for all your surrogate keys then - for sure - you will end up with contention on your database.
- They are usually implemented using auto-incremented columns which mean:
- Error prone. A sequence has a max_value limit so - as a developer - you have to put attention to the following points:
- You must cycle your sequence ( when the max-value is reached it goes back to 1,2,...).
- If you are using the sequence as an ordering (over time) of your data then you must handle the case of cycling (column with Id 1 might be newer than row with Id max-value - 1).
- Make sure that your code (and even your client interfaces which should not happen as it supposed to be an internal Id) supports 32b/64b integers that you used to store your sequence values.
- They don't guarantee non duplicated data. You can always have 2 rows with all the same column values but with a different generated value. For me this is THEproblem of surrogate keys from a database design point of view.
- More in Wikipedia...
- 性能问题的来源:
- 它们通常使用自动递增的列实现,这意味着:
- 每次你想获得一个新的 Id 时都要到数据库的往返(我知道这可以使用缓存或 [seq]hilo 类似的算法来改进,但这些方法仍然有自己的缺点)。
- 如果有一天您需要将数据从一种模式移动到另一种模式(至少在我公司经常发生这种情况),那么您可能会遇到 Id 冲突问题。是的,我知道您可以使用 UUID,但这些持续需要 32 个十六进制数字!(如果您关心数据库大小,那么这可能是一个问题)。
- 如果您对所有代理键使用一个序列,那么-当然-您最终会在数据库上发生争用。
- 它们通常使用自动递增的列实现,这意味着:
- 容易出错。序列具有 max_value 限制,因此作为开发人员,您必须注意以下几点:
- 您必须循环您的序列(当达到最大值时,它会回到 1,2,...)。
- 如果您使用序列作为数据的排序(随着时间的推移),那么您必须处理循环的情况(Id 为 1 的列可能比 Id max-value - 1 的行新)。
- 确保您的代码(甚至您的客户端接口不应发生,因为它应该是内部 Id)支持用于存储序列值的 32b/64b 整数。
- 他们不保证非重复数据。您始终可以有 2 行具有所有相同的列值但具有不同的生成值。对我来说这是THE从一个数据库设计点代理键的问题。
- 更多在维基百科...
Myths on natural keys
关于自然键的神话
- Composite keys are less inefficient than surrogate keys. No! It depends on the used database engine:
- Natural keys don't exist in real-life. Sorry but they do exist! In aviation industry, for example, the following tuple will be always unique regarding a given scheduledflight (airline, departureDate, flightNumber, operationalSuffix). More generally, when a set of business data is guaranteed to be unique by a given standardthen this set of data is a [good] natural key candidate.
- Natural keys "pollute the schema" of child tables. For me this is more a feeling than a real problem. Having a 4 columns primary-key of 2 bytes each might be more efficient than a single column of 11 bytes. Besides, the 4 columns can be used to query the child table directly (by using the 4 columns in a where clause) without joining to the parent table.
- 复合键的效率低于代理键。不!这取决于使用的数据库引擎:
- 现实生活中不存在自然键。抱歉,它们确实存在!例如,在航空业中,对于给定的预定航班(航空公司、出发日期、航班号、运营后缀),以下元组将始终是唯一的。更一般地,当一组业务数据被给定的标准保证是唯一的时,这组数据是一个[好的]自然关键候选者。
- 自然键“污染模式”子表。对我来说,这与其说是一个真正的问题,不如说是一种感觉。拥有 4 列 2 字节的主键可能比单列 11 字节更有效。此外,这4列可用于直接查询子表(使用where子句中的4列),无需连接到父表。
Conclusion
结论
Use natural keys when it is relevant to do so and use surrogate keys when it is better to use them.
在相关的情况下使用自然键,并在最好使用代理键时使用代理键。
Hope that this helped someone!
希望这对某人有所帮助!
回答by Iain Holder
Alway use a key that has no business meaning. It's just good practice.
始终使用没有商业意义的密钥。这只是很好的做法。
EDIT: I was trying to find a link to it online, but I couldn't. However in 'Patterns of Enterprise Archtecture'[Fowler] it has a good explanation of why you shouldn't use anything other than a key with no meaning other than being a key. It boils down to the fact that it should have one job and one job only.
编辑:我试图在网上找到它的链接,但我找不到。然而,在“企业架构模式”[Fowler] 中,它很好地解释了为什么除了作为密钥之外没有任何意义的密钥之外,您不应该使用任何其他东西。它归结为一个事实,即它应该只有一份工作和一份工作。
回答by Derek Lawless
Surrogate keys are quite handy if you plan to use an ORM tool to handle/generate your data classes. While you can use composite keys with some of the more advanced mappers (read: hibernate), it adds some complexity to your code.
如果您计划使用 ORM 工具来处理/生成您的数据类,代理键非常方便。虽然您可以将复合键与一些更高级的映射器(阅读:hibernate)一起使用,但它会给您的代码增加一些复杂性。
(Of course, database purists will argue that even the notion of a surrogate key is an abomination.)
(当然,数据库纯粹主义者会争辩说,即使是代理键的概念也是令人厌恶的。)
I'm a fan of using uids for surrogate keys when suitable. The major win with them is that you know the key in advance e.g. you can create an instance of a class with the ID already set and guaranteed to be unique whereas with, say, an integer key you'll need to default to 0 or -1 and update to an appropriate value when you save/update.
我喜欢在合适的时候使用 uid 作为代理键。它们的主要优势在于您事先知道密钥,例如您可以创建一个类的实例,该实例的 ID 已经设置并保证是唯一的,而如果使用整数键,您需要默认为 0 或 - 1 并在保存/更新时更新为适当的值。
UIDs have penalties in terms of lookup and join speed though so it depends on the application in question as to whether they're desirable.
UID 在查找和连接速度方面有惩罚,所以它取决于所讨论的应用程序是否需要它们。
回答by Mark Embling
Using a surrogate key is better in my opinion as there is zero chance of it changing. Almost anything I can think of which you might use as a natural key could change (disclaimer: not always true, but commonly).
在我看来,使用代理键更好,因为它改变的可能性为零。我能想到的几乎任何你可以用作自然键的东西都可以改变(免责声明:并非总是如此,但通常如此)。
An example might be a DB of cars - on first glance, you might think that the licence plate could be used as the key. But these could be changed so that'd be a bad idea. You wouldnt really want to find that out afterreleasing the app, when someone comes to you wanting to know why they can't change their number plate to their shiny new personalised one.
一个例子可能是汽车数据库 - 乍一看,您可能认为车牌可以用作钥匙。但是这些可以改变,所以那是个坏主意。在发布应用程序后,您不会真的想知道这一点,当有人来找您想知道为什么他们不能将他们的车牌更改为闪亮的新个性化车牌时。