SQL 表中主键的最佳实践是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/337503/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What's the best practice for primary keys in tables?
提问by Lloyd Cotten
When designing tables, I've developed a habit of having one column that is unique and that I make the primary key. This is achieved in three ways depending on requirements:
在设计表格时,我养成了让一列唯一并作为主键的习惯。根据要求,这可以通过三种方式实现:
- Identity integer column that auto increments.
- Unique identifier (GUID)
- A short character(x) or integer (or other relatively small numeric type) column that can serve as a row identifier column
- 自动递增的标识整数列。
- 唯一标识符 (GUID)
- 可用作行标识符列的短字符 (x) 或整数(或其他相对较小的数字类型)列
Number 3 would be used for fairly small lookup, mostly read tables that might have a unique static length string code, or a numeric value such as a year or other number.
数字 3 将用于相当小的查找,主要是读取可能具有唯一静态长度字符串代码或数字值(例如年份或其他数字)的表。
For the most part, all other tables will either have an auto-incrementing integer or unique identifier primary key.
在大多数情况下,所有其他表将具有自动递增的整数或唯一标识符主键。
The Question :-)
问题:-)
I have recently started working with databases that have no consistent row identifier and primary keys are currently clustered across various columns. Some examples:
我最近开始使用没有一致行标识符和主键当前聚集在各个列中的数据库。一些例子:
- datetime/character
- datetime/integer
- datetime/varchar
- char/nvarchar/nvarchar
- 日期时间/字符
- 日期时间/整数
- 日期时间/varchar
- 字符/nvarchar/nvarchar
Is there a valid case for this? I would have always defined an identity or unique identifier column for these cases.
对此有有效的案例吗?对于这些情况,我总是会定义一个身份或唯一标识符列。
In addition there are many tables without primary keys at all. What are the valid reasons, if any, for this?
此外还有很多表根本没有主键。这样做的正当理由是什么(如果有的话)?
I'm trying to understand why tables were designed as they were, and it appears to be a big mess to me, but maybe there were good reasons for it.
我试图理解为什么表格被设计成这样,这对我来说似乎是一团糟,但也许有很好的理由。
A third question to sort of help me decipher the answers: In cases where multiple columns are used to comprise the compound primary key, is there a specific advantage to this method vs. a surrogate/artificial key? I'm thinking mostly in regards to performance, maintenance, administration, etc.?
第三个问题可以帮助我解读答案:在使用多列组成复合主键的情况下,这种方法与代理/人工键相比是否有特定优势?我主要考虑性能、维护、管理等方面的问题?
采纳答案by Logicalmind
I follow a few rules:
我遵循一些规则:
- Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use.
- Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes.
- Do NOT use "your problem primary key" as your logic model primary key. For example passport number, social security number, or employee contract number as these "primary key" can change for real world situations.
- 主键应尽可能小。首选数字类型,因为数字类型以比字符格式更紧凑的格式存储。这是因为大多数主键将是另一个表中的外键以及在多个索引中使用。您的键越小,索引越小,您将使用的缓存中的页面就越少。
- 主键永远不应该改变。更新主键应该永远是不可能的。这是因为它最有可能在多个索引中使用并用作外键。更新单个主键可能会导致更改的连锁反应。
- 不要使用“您的问题主键”作为您的逻辑模型主键。例如,护照号码、社会保险号码或员工合同号码作为这些“主键”可能会因实际情况而改变。
On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.
关于代理与自然键,我参考了上面的规则。如果自然键很小并且永远不会改变,则可以将其用作主键。如果自然键很大或可能会改变,我会使用代理键。如果没有主键,我仍然会创建一个代理键,因为经验表明您总是会向架构中添加表,并希望将主键放置到位。
回答by Tony Andrews
Natural verses artifical keys is a kind of religious debate among the database community - see this articleand others it links to. I'm neither in favour of alwayshaving artifical keys, nor of neverhaving them. I would decide on a case-by-case basis, for example:
Natural verses 人工键是数据库社区中的一种宗教辩论 - 请参阅本文及其链接的其他文章。我既不赞成总是拥有人造钥匙,也不赞成从不拥有它们。我会根据具体情况做出决定,例如:
- US States: I'd go for state_code ('TX' for Texas etc.), rather than state_id=1 for Texas
- Employees: I'd usually create an artifical employee_id, because it's hard to find anything else that works. SSN or equivalent may work, but there could be issues like a new joiner who hasn't supplied his/her SSN yet.
- Employee Salary History: (employee_id, start_date). I would notcreate an artifical employee_salary_history_id. What point would it serve (other than "foolish consistency")
- 美国各州:我会选择 state_code(德克萨斯州的“TX”等),而不是德克萨斯州的 state_id=1
- 员工:我通常会创建一个人工的员工 ID,因为很难找到其他有效的方法。SSN 或同等身份可能有效,但可能会出现问题,例如新加入者尚未提供他/她的 SSN。
- 员工工资历史:(employee_id,start_date)。我不会创建一个人为的employee_salary_history_id。它有什么意义(除了“愚蠢的一致性”)
Wherever artificial keys are used, you should always also declare unique constraints on the natural keys. For example, use state_id if you must, but then you'd better declare a unique constraint on state_code, otherwise you are sure to eventually end up with:
无论在何处使用人工键,您还应该始终声明对自然键的唯一约束。例如,如果必须的话,使用 state_id,但是你最好在 state_code 上声明一个唯一的约束,否则你最终肯定会得到:
state_id state_code state_name
137 TX Texas
... ... ...
249 TX Texas
回答by WW.
Just an extra comment on something that is often overlooked. Sometimes not using a surrogate key has benefits in the child tables. Let's say we have a design that allows you to run multiple companies within the one database (maybe it's a hosted solution, or whatever).
只是对经常被忽视的事情的额外评论。有时不使用代理键对子表有好处。假设我们有一个设计,允许您在一个数据库中运行多个公司(也许它是一个托管解决方案,或者其他什么)。
Let's say we have these tables and columns:
假设我们有这些表和列:
Company:
CompanyId (primary key)
CostCenter:
CompanyId (primary key, foreign key to Company)
CostCentre (primary key)
CostElement
CompanyId (primary key, foreign key to Company)
CostElement (primary key)
Invoice:
InvoiceId (primary key)
CompanyId (primary key, in foreign key to CostCentre, in foreign key to CostElement)
CostCentre (in foreign key to CostCentre)
CostElement (in foreign key to CostElement)
In case that last bit doesn't make sense, Invoice.CompanyId
is part of two foreign keys, one to the CostCentretable and one to the CostElementtable. The primary key is (InvoiceId, CompanyId).
如果最后一位没有意义,它Invoice.CompanyId
是两个外键的一部分,一个到CostCentre表,一个到CostElement表。主键是 ( InvoiceId, CompanyId)。
In this model, it's not possible to screw-up and reference a CostElementfrom one company and a CostCentrefrom another company. If a surrogate key was used on the CostElementand CostCentre tables, it would be.
在这个模型中,这是不可能的螺丝,并引用CostElement从一个公司和一个CostCentre从另一家公司。如果在CostElement和CostCentre 表上使用了代理键,那就是。
The fewer chances to screw up, the better.
搞砸的机会越少越好。
回答by Paul
I avoid using natural keys for one simple reason -- human error. Although natural unique identifiers are often available (SSN, VIN, Account Number, etc.), they require a human to enter them correctly. If you're using SSNs as a primary key, someone transposes a couple of numbers during data entry, and the error isn't discovered immediately, then you're faced with changing your primary key.
我避免使用自然键的原因很简单——人为错误。尽管自然唯一标识符通常可用(SSN、VIN、帐号等),但它们需要人工正确输入。如果您使用 SSN 作为主键,有人在数据输入期间调换了几个数字,并且没有立即发现错误,那么您将面临更改主键的问题。
My primary keys are all handled by the database program in the background and the user is never aware of them.
我的主键都由后台的数据库程序处理,用户永远不会知道它们。
回答by DonOctavioDelFlores
There′s no problem in making your primary key from various fields, that's a Natural Key.
从各个领域制作主键没有问题,那就是Natural Key。
You can use a Identity column (associated with a unique index on the candidate fields) to make a Surrogate Key.
您可以使用 Identity 列(与候选字段上的唯一索引相关联)来创建Surrogate Key。
That′s an old discussion. I prefer surrogate keys in most situations.
这是一个古老的讨论。在大多数情况下,我更喜欢代理键。
But there′s no excuse for the lack of a key.
但是没有钥匙是没有理由的。
RE: EDIT
回复:编辑
Yeah, there′s a lot of controversy about that :D
是的,对此有很多争议 :D
I don′t see any obvious advantage on natural keys, besides the fact that they are the natural choice. You will always think in Name, SocialNumber- or something like that - instead of idPerson.
除了它们是自然选择的事实之外,我没有看到自然键的任何明显优势。您总是会想到Name、SocialNumber- 或类似的东西 - 而不是idPerson。
Surrogate keys are the answer to some of the problems that natural keys have (propagating changes for example).
代理键是自然键所存在的一些问题的答案(例如传播更改)。
As you get used to surrogates, it seems more clean, and manageable.
当你习惯了代理时,它看起来更干净,更易于管理。
But in the end, you′ll find out that it's just a matter of taste - or mindset -. People "think better" with natural keys, and others don′t.
但最终,您会发现这只是品味或心态的问题。人们用自然键“思考得更好”,而其他人则不然。
回答by Patrick Desjardins
Tables should have a primary key all the time. When it doesn't it should have been an AutoIncrement fields.
表应该始终有一个主键。如果不是,它应该是一个 AutoIncrement 字段。
Sometime people omit primary key because they transfer a lot of data and it might slow down (depend of the database) the process. BUT, it should be added after it.
有时人们会省略主键,因为他们传输了大量数据,这可能会减慢(取决于数据库)过程。但是,它应该添加在它之后。
Some one comment about link table, this is right, it's an exception BUT fields should be FK to keep the integrity, and is some case those fields can be primary keys too if duplicate in links is not authorized... but to keep in a simple form because exception is something often in programming, primary key should be present to keep the integrity of your data.
关于链接表的一些评论,这是对的,这是一个例外,但字段应该是 FK 以保持完整性,并且在某些情况下,如果链接中的重复项未被授权,这些字段也可以是主键......但要保持在一个简单的形式因为异常是编程中经常出现的东西,所以应该存在主键以保持数据的完整性。
回答by RayLuo
Besides all those good answers, I just want to share a good article I just read, The great primary-key debate.
除了所有这些好的答案,我只想分享我刚刚阅读的一篇好文章,主键辩论。
Just to quote a few points:
仅引用几点:
The developer must apply a few rules when choosing a primary key for each table:
开发人员在为每个表选择主键时必须应用一些规则:
- The primary key must uniquely identify each record.
- A record's primary-key value can't be null.
- The primary key-value must exist when the record is created.
- The primary key must remain stable—you can't change the primary-key field(s).
- The primary key must be compact and contain the fewest possible attributes.
- The primary-key value can't be changed.
- 主键必须唯一标识每条记录。
- 记录的主键值不能为空。
- 创建记录时,主键值必须存在。
- 主键必须保持稳定——您不能更改主键字段。
- 主键必须紧凑并且包含尽可能少的属性。
- 主键值不能更改。
Natural keys (tend to) break the rules. Surrogate keys comply with the rules. (You better read through that article, it is worth your time!)
自然键(倾向于)打破规则。代理键符合规则。(你最好通读那篇文章,值得你花时间!)
回答by Luke
What is special about the primary key?
主键有什么特别之处?
What is the purpose of a table in a schema? What is the purpose of a key of a table? What is special about the primary key? The discussions around primary keys seem to miss the point that the primary key is part of a table, and that table is part of a schema. What is best for the table and table relationships should drive the key that is used.
模式中的表的用途是什么?表键的用途是什么?主键有什么特别之处?围绕主键的讨论似乎忽略了主键是表的一部分,而该表是模式的一部分。最适合表和表关系的应该驱动所使用的键。
Tables (and table relationships) contain facts about information you wish to record. These facts should be self-contained, meaningful, easily understood, and non-contradictory. From a design perspective, other tables added or removed from a schema should not impact on the table in question. There must be a purpose for storing the data related only to the information itself. Understanding what is stored in a table should not require undergoing a scientific research project. No fact stored for the same purpose should be stored more than once. Keys are a whole or part of the information being recorded which is unique, and the primary key is the specially designated key that is to be the primary access point to the table (i.e. it should be chosen for data consistency and usage, not just insert performance).
表(和表关系)包含有关您希望记录的信息的事实。这些事实应该是自成一体的、有意义的、易于理解且不矛盾的。从设计的角度来看,从模式中添加或删除的其他表不应影响相关表。必须有一个目的来存储仅与信息本身相关的数据。了解表中存储的内容不需要进行科学研究项目。为同一目的存储的任何事实都不应存储超过一次。键是被记录的唯一信息的全部或部分,主键是特别指定的键,它是表的主要访问点(即应该选择它是为了数据的一致性和使用,而不仅仅是插入表现)。
- ASIDE: The unfortunately side effect of most databases being designed and developed by application programmers (which I am sometimes) is that what is best for the application or application framework often drives the primary key choice for tables. This leads to integer and GUID keys (as these are simple to use for application frameworks) and monolithic table designs (as these reduce the number of application framework objects needed to represent the data in memory). These application driven database design decisions lead to significant data consistency problems when used at scale. Application frameworks designed in this manner naturally lead to table at a time designs. “Partial records” are created in tables and data filled in over time. Multi-table interaction is avoided or when used causes inconsistent data when the application functions improperly. These designs lead to data that is meaningless (or difficult to understand), data spread over tables (you have to look at other tables to make sense of the current table), and duplicated data.
- 旁白:不幸的是,大多数由应用程序程序员(我有时就是这样)设计和开发的数据库的副作用是,最适合应用程序或应用程序框架的往往会驱动表的主键选择。这导致了整数和 GUID 键(因为它们易于用于应用程序框架)和单片表设计(因为它们减少了表示内存中数据所需的应用程序框架对象的数量)。这些应用程序驱动的数据库设计决策在大规模使用时会导致严重的数据一致性问题。以这种方式设计的应用程序框架自然会导致一次表设计。“部分记录”是在表格中创建的,随着时间的推移填充数据。避免多表交互,或者在应用程序运行不正常时使用导致数据不一致。这些设计导致数据无意义(或难以理解)、数据分散在表中(您必须查看其他表才能理解当前表)和重复数据。
It was said that primary keys should be as small as necessary. I would says that keys should be only as large as necessary. Randomly adding meaningless fields to a table should be avoided. It is even worse to make a key out of a randomly added meaningless field, especially when it destroys the join dependency from another table to the non-primary key. This is only reasonable if there are no good candidate keys in the table, but this occurrence is surely a sign of a poor schema design if used for all tables.
据说主键应该尽可能小。我会说密钥应该只在必要时大。应避免向表中随机添加无意义的字段。从随机添加的无意义字段中生成键更糟糕,尤其是当它破坏了从另一个表到非主键的连接依赖时。这只有在表中没有好的候选键时才合理,但如果用于所有表,这种情况肯定是模式设计不佳的标志。
It was also said that primary keys should never change as updating a primary key should always be out of the question. But update is the same as delete followed by insert. By this logic, you should never delete a record from a table with one key and then add another record with a second key. Adding the surrogate primary key does not remove the fact that the other key in the table exists. Updating a non-primary key of a table can destroy the meaning of the data if other tables have a dependency on that meaning through a surrogate key (e.g. a status table with a surrogate key having the status description changed from ‘Processed' to ‘Cancelled' would definitely corrupt the data). What should always be out of the question is destroying data meaning.
也有人说主键永远不应该改变,因为更新主键应该永远是不可能的。但是更新与删除后插入相同。按照这个逻辑,你永远不应该用一个键从表中删除一条记录,然后用第二个键添加另一条记录。添加代理主键不会消除表中存在另一个键的事实。如果其他表通过代理键(例如,具有代理键的状态表的状态描述从“已处理”更改为“已取消”),则更新表的非主键可能会破坏数据的含义'肯定会破坏数据)。永远不可能破坏数据的意义。
Having said this, I am grateful for the many poorly designed databases that exist in businesses today (meaningless-surrogate-keyed-data-corrupted-1NF behemoths), because that means there is an endless amount of work for people that understand proper database design. But on the sad side, it does sometimes make me feel like Sisyphus, but I bet he had one heck of a 401k (before the crash). Stay away from blogs and websites for important database design questions. If you are designing databases, look up CJ Date. You can also reference Celko for SQL Server, but only if you hold your nose first. On the Oracle side, reference Tom Kyte.
话虽如此,我很感激当今企业中存在的许多设计不佳的数据库(无意义的代理键控数据损坏的 1NF 庞然大物),因为这意味着了解正确数据库设计的人们有无穷无尽的工作. 但令人难过的是,它有时确实让我觉得自己像西西弗斯,但我敢打赌他有一个 401k(在崩溃之前)。对于重要的数据库设计问题,远离博客和网站。如果您正在设计数据库,请查找 CJ Date。您也可以为 SQL Server 引用 Celko,但前提是您先捏住鼻子。在 Oracle 方面,请参考 Tom Kyte。
回答by Rodney P. Barbati
Here are my own rule of thumbs I have settled on after 25+ years of development experience.
以下是我在 25 年以上的开发经验后确定的经验法则。
- All tables should have a single column primary key that auto increments.
- Include it in any view that is meant to be updateable
- The primary key should not have any meaning in the context of your application. This means that it should not be a SKU, or an account number or an employee id or any other information that is meaningful to your application. It is merely a unique key associated with an entity.
- 所有表都应该有一个自动递增的单列主键。
- 将其包含在任何可更新的视图中
- 主键在您的应用程序上下文中不应具有任何意义。这意味着它不应是 SKU、帐号、员工 ID 或任何其他对您的应用程序有意义的信息。它只是与实体关联的唯一键。
The primary key is used by the database for optimization purposes and should not be used by your application for anything more than identifying a particular entity or relating to a particular entity.
主键由数据库用于优化目的,您的应用程序不应将主键用于识别特定实体或与特定实体相关的任何其他用途。
Always having a single value primary key makes performing UPSERTs very straightforward.
始终拥有一个单值主键使得执行 UPSERT 非常简单。
Use additional indices to support multi-column keys which have meaning in your application.
使用附加索引来支持在您的应用程序中有意义的多列键。
回答by James Curran
A natural key, if available, is usually best. So, if datetime/char uniquelyidentifies the row and both parts are meaningful to the row, that's great.
如果可用,自然键通常是最好的。因此,如果 datetime/char唯一标识该行并且这两个部分对该行都有意义,那就太好了。
If just the datetime is meaningful, and the char is just tacked on to make it unique, then you might as well just go with an identify field.
如果只是日期时间是有意义的,并且只是附加了字符以使其唯一,那么您最好只使用标识字段。