SQL 一般来说,数据库中的每个表都应该有一个标识字段用作 PK 吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1207983/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
in general, should every table in a database have an identity field to use as a PK?
提问by jcollum
This seems like a duplicate even as I ask it, but I searched and didn't find it. It seems like a good question for SO -- even though I'm sure I can find it on many blogs etc. out there. SO will have more of a debate than you can get on a blog.
即使我问它,这似乎也是重复的,但我搜索并没有找到它。这对 SO 来说似乎是一个好问题——尽管我确信我可以在许多博客等上找到它。SO 将有更多的辩论,而不是在博客上。
I'm running into an issue with a join: getting back too many records. I think of this as "expansion". I added a table to the set of joins and the number of rows expanded, way too much. Usually when this happens I add a select of all the ID fields that are involved in the join. That way it's pretty obvious where the expansion is happening and I can change the ON of the join to fix it. Except in this case, the table that I added doesn't have an ID field. To me, this is a problem. But perhaps I'm wrong.
我遇到了一个连接问题:取回太多记录。我认为这是“扩展”。我在连接集和扩展的行数中添加了一个表,方式太多了。通常,当发生这种情况时,我会添加一个包含在连接中的所有 ID 字段的选择。这样,扩展发生的位置非常明显,我可以更改连接的 ON 来修复它。除了这种情况,我添加的表没有 ID 字段。对我来说,这是一个问题。但也许我错了。
The question: should every table in a database have an IDENTITY field that's used as the PK? Are there any drawbacks to having an ID field in every table? What if you're reasonably sure this table will never be used in a PK/FK relationship?
问题:数据库中的每个表都应该有一个用作 PK 的 IDENTITY 字段吗?在每个表中都有一个 ID 字段有什么缺点吗?如果您有理由确定此表永远不会用于 PK/FK 关系,该怎么办?
Related, but not duplicate: When having an identity column is not a good idea?
相关但不重复:当拥有标识列不是一个好主意时?
Apparently this debate has been going on for a while. Shoulda known.
显然,这场辩论已经进行了一段时间。应该知道。
This post(surrogate vs. natural keys) is also relevant.
这篇文章(代理与自然键)也是相关的。
回答by Quassnoi
There are two concepts that are close but should not be confused: IDENTITY
and PRIMARY KEY
有两个概念很接近但不应混淆:IDENTITY
和PRIMARY KEY
Every table (except for the rare conditions) should have a PRIMARY KEY
, that is a value or a set of values that uniquely identify a row.
每个表(除了罕见的情况)都应该有一个PRIMARY KEY
,即唯一标识一行的一个值或一组值。
See herefor discussion why.
请参阅此处讨论原因。
IDENTITY
is a property of a column in SQL Server
which means that the column will be filled automatically with incrementing values.
IDENTITY
是列的属性,SQL Server
这意味着该列将自动填充递增值。
Due to the nature of this property, the values of this column are inherently UNIQUE
.
由于此属性的性质,此列的值本质上是UNIQUE
。
However, no UNIQUE
constraint or UNIQUE
index is automatically created on IDENTITY
column, and after issuing SET IDENTITY_INSERT ON
it's possible to insert duplicate values into an IDENTITY
column, unless it had been explicity UNIQUE
constrained.
但是,不会在列上自动创建UNIQUE
约束或UNIQUE
索引IDENTITY
,并且在发出后SET IDENTITY_INSERT ON
可以将重复值插入到IDENTITY
列中,除非它已被明确UNIQUE
约束。
The IDENTITY
column should not necessarily be a PRIMARY KEY
, but most often it's used to fill the surrogate PRIMARY KEY
s
该IDENTITY
列不一定是 a PRIMARY KEY
,但最常用于填充 surrogate PRIMARY KEY
s
It may or may not be useful in any particular case.
它在任何特定情况下可能有用也可能没有用。
Therefore, the answer to your question:
因此,您的问题的答案是:
The question: should every table in a database have an IDENTITY field that's used as the PK?
问题:数据库中的每个表都应该有一个用作 PK 的 IDENTITY 字段吗?
is this:
这是:
No. There are cases when a database table should NOT have an IDENTITY
field as a PRIMARY KEY
.
不可以。在某些情况下,数据库表不应将IDENTITY
字段作为PRIMARY KEY
.
Three cases come into my mind when it's not the best idea to have an IDENTITY
as a PRIMARY KEY
:
当拥有IDENTITY
as a不是最好的主意时,我想到了三种情况PRIMARY KEY
:
- If your
PRIMARY KEY
is composite (like in many-to-many link tables) - If your
PRIMARY KEY
is natural (like, a state code) - If your
PRIMARY KEY
should be unique across databases (in this case you useGUID
/UUID
/NEWID
)
- 如果您
PRIMARY KEY
是复合的(例如在多对多链接表中) - 如果您
PRIMARY KEY
是自然的(例如,州代码) - 如果你
PRIMARY KEY
要跨数据库唯一的(在这种情况下,你使用GUID
/UUID
/NEWID
)
All these cases imply the following condition:
所有这些情况都意味着以下条件:
You shouldn't have IDENTITY
when you care for the values of your PRIMARY KEY
and explicitly insert them into your table.
IDENTITY
当您关心您的值PRIMARY KEY
并将它们显式插入到您的表中时,您不应该有。
Update:
更新:
Many-to-many link tables should have the pair of id
's to the table they link as the composite key.
多对多链接表应该有一对id
's 到它们链接的表作为复合键。
It's a natural composite key which you already have to use (and make UNIQUE
), so there is no point to generate a surrogate key for this.
它是您已经必须使用(和 make UNIQUE
)的自然组合键,因此没有必要为此生成代理键。
I don't see why would you want to reference a many-to-many
link table from any other table except the tables they link, but let's assume you have such a need.
我不明白你为什么many-to-many
要从任何其他表中引用链接表,除了它们链接的表,但让我们假设你有这样的需求。
In this case, you just reference the link table by the composite key.
在这种情况下,您只需通过组合键引用链接表。
This query:
这个查询:
CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (a_id, b_id, PRIMARY KEY (a_id, b_id))
CREATE TABLE business_rule (id, a_id, b_id, FOREIGN KEY (a_id, b_id) REFERENCES ab)
SELECT *
FROM business_rule br
JOIN a
ON a.id = br.a_id
is much more efficient than this one:
比这个更有效:
CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (id, a_id, b_id, PRIMARY KEY (id), UNIQUE KEY (a_id, b_id))
CREATE TABLE business_rule (id, ab_id, FOREIGN KEY (ab_id) REFERENCES ab)
SELECT *
FROM business_rule br
JOIN a_to_b ab
ON br.ab_id = ab.id
JOIN a
ON a.id = ab.a_id
, for obvious reasons.
,原因显而易见。
回答by Greg D
Almost always yes. I generally default to including an identity field unless there's a compelling reason not to. I rarely encounter such reasons, and the cost of the identity field is minimal, so generally I include.
几乎总是是的。我通常默认包含一个身份字段,除非有令人信服的理由不这样做。我很少遇到这样的原因,而且身份字段的开销很小,所以一般我都包括。
Only thing I can think of off the top of my head where I didn't was a highly specialized database that was being used more as a datastore than a relational database where the DBMS was being used for nearly every feature except significant relational modelling. (It was a high volume, high turnover data buffer thing.)
我唯一能想到的就是一个高度专业化的数据库,它更多地用作数据存储而不是关系数据库,其中 DBMS 几乎用于除重要的关系建模之外的所有功能。(这是一个大容量、高周转数据缓冲区的事情。)
回答by HLGEM
I'm a firm believer that natural keys are often far worse than artificial keys because you often have no control over whether they will change which can cause horrendous data integrity or performance problems.
我坚信自然键通常比人工键差得多,因为您通常无法控制它们是否会发生变化,这会导致可怕的数据完整性或性能问题。
However, there are some (very few) natural keys that make sense without being an identity field (two-letter state abbreviation comes to mind, it is extremely rare for these official type abbreviations to change.)
但是,有一些(极少数)自然键在没有身份字段的情况下有意义(想到了两个字母的状态缩写,这些官方类型缩写发生变化的情况极为罕见。)
Any table which is a join table to model a many to many relationship probably also does not need an additional identity field. Making the two key fields together the primary key will work just fine.
任何作为连接表来建模多对多关系的表可能也不需要额外的标识字段。将两个键字段一起作为主键将工作得很好。
Other than that I would, in general, add an identity field to most other tables unless given a compelling reason in that particular case not to. It is a bad practice to fail to create a primary key on a table or if you are using surrogate keys to fail to place a unique index on the other fields needed to guarantee uniqueness where possible (unless you really enjoy resolving duplicates).
除此之外,我通常会向大多数其他表添加标识字段,除非在特定情况下给出令人信服的理由不这样做。未能在表上创建主键,或者如果您使用代理键未能在其他字段上放置唯一索引以尽可能保证唯一性(除非您真的很喜欢解决重复项),这是一种不好的做法。
回答by Jonathan Leffler
No. Whenever you have a table with an artificial identity column, you also need to identify the natural primary key for the table and ensure that there is a unique constraint on that set of columns too so that you don't get two rows that are identical apart from the meaningless identity column by accident.
不。每当您有一个带有人工标识列的表时,您还需要确定该表的自然主键,并确保该组列上也有唯一约束,这样您就不会得到两行除了偶然出现无意义的标识栏外,完全相同。
Adding an identity column is not cost free. There is an overhead in adding an unnecessary identity column to a table - typically 4 bytes per row of storage for the identity value, plus a whole extra index (which will probably weigh in at 8-12 bytes per row plus overhead). It also takes slightly to work out the most cost-effective query plan because there is an extra index per table. Granted, if the table is small and the machine is big, this overhead is not critical - but for the biggest systems, it matters.
添加标识列不是免费的。向表中添加不必要的标识列会产生开销 - 通常每行 4 个字节存储标识值,加上整个额外索引(每行可能占 8-12 个字节加上开销)。制定最具成本效益的查询计划也需要花费一些时间,因为每个表都有一个额外的索引。当然,如果表很小而机器很大,这个开销并不重要——但对于最大的系统来说,这很重要。
回答by Mark Roddy
Every table should have some set of field(s) that uniquely identify it. Whether or not there is a numeric identifier field separate from the data fields will depend on the domain you are attempting to model. Not all data easily falls into the 'single numeric id' paradigm, and as such it would be inappropriate to force it. Given that, a lot of data does easily fit in this paradigm and as such would call for such an identifier. There is no one answer to always do X in any programming environment, and this is another example.
每个表都应该有一组唯一标识它的字段。是否存在与数据字段分开的数字标识符字段将取决于您尝试建模的域。并非所有数据都容易落入“单一数字 ID”范式,因此强制使用它是不合适的。鉴于此,很多数据很容易适合这种范式,因此需要这样的标识符。在任何编程环境中都没有一个总是执行 X 的答案,这是另一个例子。
回答by gbn
If you have modelled, designed, normalised etc, then you will have no identity columns.
如果您已经建模、设计、规范化等,那么您将没有标识列。
You will have identified natural and candidate keys for your tables.
您将为您的表确定自然键和候选键。
You may decide on a surrogate key because of the physical architecture (eg narrow, numeric, strictly monotonically increasing), say, because using a nvarchar(100) column is not a good idea (still need unique constraint).
您可能会因为物理架构(例如窄、数字、严格单调递增)而决定使用代理键,例如,因为使用 nvarchar(100) 列不是一个好主意(仍然需要唯一约束)。
Or because of ideology: they appeal to OO developers I've found.
或者因为意识形态:它们吸引了我发现的面向对象开发人员。
Ok, assume ID columns. As your db gets more complex, say several layers, how can you jon parent and grand-.child tables directly. You can't: you always need intermediate tables and well indexed PK-FL columns. With a composite key, it's all there for you...
好的,假设 ID 列。随着您的数据库变得越来越复杂,比如说几层,您如何直接连接父表和孙表。你不能:你总是需要中间表和索引良好的 PK-FL 列。有了复合键,一切都在那里……
Don't get me wrong: I use them. But I know why I use them...
不要误会我的意思:我使用它们。但我知道我为什么使用它们......
Edit:
编辑:
I'd be interested to collate "always ID"+"no stored procs" matches on one hand, with "use stored procs"+"IDs when they benefit" on the other...
我有兴趣一方面整理“始终 ID”+“无存储过程”匹配,另一方面整理“使用存储过程”+“ID 受益时”...
回答by Charles Bretana
Recognize the distinction between an Identity field and a key... Every table should have a key, to eliminate the data corruption of inadvertently entering multiple rows that represent the same 'entity'. If the only key a table has is a meaningless surrogate key, then this function is effectively missing.
识别身份字段和键之间的区别...每个表都应该有一个键,以消除无意中输入代表同一“实体”的多行的数据损坏。如果表唯一的键是一个无意义的代理键,那么这个函数实际上是缺失的。
otoh, No table 'needs' an identity, and certainly not every table benefits from one... Examples are: A table with a short and functional key, a table which does not have any other table referencing it through a foreign Key, or a table which is in a one to zero-or-one relationship with another table... none of these need an Identity
otoh,没有表“需要”一个身份,当然不是每个表都从一个身份中受益......示例是:具有短且功能键的表,没有任何其他表通过外键引用它的表,或与另一个表处于一对零或一关系的表......这些都不需要身份
回答by jeje
I can't think of any drawback about having an ID field in each table. Providing your the type of your ID field provides enough space for your table to grow.
我想不出在每个表中都有一个 ID 字段有什么缺点。提供您的 ID 字段类型可为您的表增长提供足够的空间。
However, you don't necessarily need a single field to ensure the identity of your rows. So no, a single ID field is not mandatory.
但是,您不一定需要单个字段来确保行的标识。所以不,单个 ID 字段不是必需的。
Primary and Foreign Keyscan consist not only of one field, but of multiple fields. This is typical for tables implementing a N-N relationship.
主键和外键不仅可以由一个字段组成,也可以由多个字段组成。这对于实现 NN 关系的表来说是典型的。
You can perfectly have PRIMARY KEY (fa, fb)
on your table:
你可以完美地PRIMARY KEY (fa, fb)
在你的桌子上:
CREATE TABLE t(fa INT , fb INT);
ALTER TABLE t ADD PRIMARY KEY(fa , fb);
回答by marc_s
Yes, for the vast majority of cases.
是的,对于绝大多数情况。
Edge cases or exceptions might be things like:
边缘情况或异常可能是这样的:
- two-way join tables to model m:n relationships
- temporary tables used for bulk-inserting huge amounts of data
- 双向连接表来建模 m:n 关系
- 用于批量插入大量数据的临时表
But other than that, I think there is no good reason against having a primary key to uniquely identify each row in a table, and in my opinion, using an IDENTITY field is one of the best choices (I prefer surrogate keys over natural keys - they're more reliable, stable, never changing etc.).
但除此之外,我认为没有充分的理由反对使用主键来唯一标识表中的每一行,在我看来,使用 IDENTITY 字段是最好的选择之一(我更喜欢代理键而不是自然键 -它们更可靠、更稳定、永不改变等)。
Marc
马克
回答by Matthew Jones
I'd say, if you can find a simple, natural key in your table (i.e. one column), use that as a key instead of an identity column.
我想说,如果你能在你的表中找到一个简单的、自然的键(即一列),把它用作键而不是标识列。
I generally give every table some kind of unique identifier, whether it is natural or generated, because then I am guaranteedthat every row is uniquely identified somehow.
我通常为每个表提供某种唯一标识符,无论是自然的还是生成的,因为这样我就可以保证每一行都以某种方式唯一标识。
Personally, I avoid IDENTITY (incrementing identity columns, like 1, 2, 3, 4) columns like the plague. They cause a lot of hassle, especially if you delete rows from that table. I use generated uniqueidentifiers instead if there is no natural key in the table.
就个人而言,我避免像瘟疫一样使用 IDENTITY(递增标识列,如 1、2、3、4)列。它们会带来很多麻烦,尤其是当您从该表中删除行时。如果表中没有自然键,我会使用生成的唯一标识符。
Anyway, no idea if this is the accepted practice, just seems right to me. YMMV.
无论如何,不知道这是否是公认的做法,对我来说似乎是正确的。天啊。