database 复合主键中的可为空列有什么问题?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/386040/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:08:06  来源:igfitidea点击:

What's wrong with nullable columns in composite primary keys?

databasedatabase-design

提问by Roman Starkov

ORACLE does not permit NULL values in any of the columns that comprise a primary key. It appears that the same is true of most other "enterprise-level" systems.

ORACLE 不允许在组成主键的任何列中使用 NULL 值。大多数其他“企业级”系统似乎也是如此。

At the same time, most systems also allow uniquecontraints on nullable columns.

同时,大多数系统还允许对可空列进行独特的约束。

Why is it that unique constraints can have NULLs but primary keys can not? Is there a fundamental logical reason for this, or is this more of a technical limitation?

为什么唯一约束可以有 NULL 而主键不能?这是否有基本的逻辑原因,或者这更多是技术限制?

回答by Tomalak

Primary keys are for uniquely identifying rows. This is done by comparing all parts of a key to the input.

主键用于唯一标识行。这是通过将密钥的所有部分与输入进行比较来完成的。

Per definition, NULL cannot be part of a successful comparison. Even a comparison to itself (NULL = NULL) will fail. This means a key containing NULL would not work.

根据定义,NULL 不能成为成功比较的一部分。甚至与自身的比较 ( NULL = NULL) 也会失败。这意味着包含 NULL 的键将不起作用。

Additonally, NULL is allowed in a foreign key, to mark an optional relationship.(*)Allowing it in the PK as well would break this.

此外,外键中允许使用 NULL,以标记可选关系。(*)在 PK 中也允许它会破坏这一点。



(*)A word of caution: Having nullable foreign keys is not clean relational database design.

(*)一个警告:具有可为空的外键不是干净的关系数据库设计。

If there are two entities Aand Bwhere Acan optionally be related to B, the clean solution is to create a resolution table (let's say AB). That table would link Awith B: If there isa relationship then it would contain a record, if there isn'tthen it would not.

如果有两个实体A,并B在那里A可以选择性地涉及到B,清洁的解决方案是创建一个解析表(比方说AB)。该表将连接AB:如果一个关系那么它将包含一个记录,如果不是那就不是。

回答by Tony Andrews

A primary key defines a unique identifier for everyrow in a table: when a table has a primary key, you have a guranteed way to select any row from it.

主键为表中的每一行定义了一个唯一标识符:当一个表有一个主键时,您有一种保证可以从中选择任何行的方法。

A unique constraint does not necessarily identify every row; it just specifies that ifa row has values in its columns, thenthey must be unique. This is not sufficient to uniquely identify everyrow, which is what a primary key must do.

唯一约束不一定标识每一行;它只是指定如果一行在其列中有值,那么它们必须是唯一的。这不足以唯一标识每一行,这是主键必须做的。

回答by zxq9

Fundamentally speaking nothing is wrong with a NULL in a multi-column primary key. But having one has implications the designer likely did not intend, which is why many systems throw an error when you try this.

从根本上说,多列主键中的 NULL 没有任何问题。但是有一个可能会产生设计人员不希望的含义,这就是为什么当您尝试此操作时许多系统会抛出错误的原因。

Consider the case of module/package versions stored as a series of fields:

考虑将模块/包版本存储为一系列字段的情况:

CREATE TABLE module
  (name        varchar(20) PRIMARY KEY,
   description text DEFAULT '' NOT NULL);

CREATE TABLE version
  (module      varchar(20) REFERENCES module,
   major       integer NOT NULL,
   minor       integer DEFAULT 0 NOT NULL,
   patch       integer DEFAULT 0 NOT NULL,
   release     integer DEFAULT 1 NOT NULL,
   ext         varchar(20),
   notes       text DEFAULT '' NOT NULL,
   PRIMARY KEY (module, major, minor, patch, release, ext));

The first 5 elements of the primary key are regularly defined parts of a release version, but some packages have a customized extension that is usually not an integer (like "rc-foo" or "vanilla" or "beta" or whatever else someone for whom fourfields is insufficient might dream up). If a package does not have an extension, then it is NULL in the above model, and no harm would be done by leaving things that way.

主键的前 5 个元素是发布版本的常规定义部分,但有些包具有自定义的扩展名,通常不是整数(如“rc-foo”或“vanilla”或“beta”或其他任何人其中4场是不够的可能梦想)。如果一个包没有扩展名,那么在上面的模型中它是 NULL,保留这种方式不会造成任何伤害。

But what isa NULL? It is supposed to represent a lackof information, an unknown. That said, perhaps this makes more sense:

但什么NULL?它应该代表缺乏信息,未知。也就是说,也许这更有意义:

CREATE TABLE version
  (module      varchar(20) REFERENCES module,
   major       integer NOT NULL,
   minor       integer DEFAULT 0 NOT NULL,
   patch       integer DEFAULT 0 NOT NULL,
   release     integer DEFAULT 1 NOT NULL,
   ext         varchar(20) DEFAULT '' NOT NULL,
   notes       text DEFAULT '' NOT NULL,
   PRIMARY KEY (module, major, minor, patch, release, ext));

In this version the "ext" part of the tuple is NOT NULL but defaults to an empty string -- which is semantically (and practically) different from a NULL. A NULL is an unknown, whereas an empty string is a deliberate record of "something not being present". In other words, "empty" and "null" are different things. Its the difference between "I don't have a value here" and "I don't know what the value here is."

在这个版本中,元组的“ext”部分不是 NULL,而是默认为空字符串——这在语义上(实际上)与 NULL 不同。NULL 是未知的,而空字符串是“不存在的东西”的故意记录。换句话说,“空”和“空”是不同的东西。“我在这里没有价值”和“我不知道这里的价值是什么”之间的区别。

When you register a package that lacks a version extension you knowit lacks an extension, so an empty string is actually the correct value. A NULL would only be correct if you didn't know whether it had an extension or not, or you knew that it did but didn't know what it was. This situation is easier to deal with in systems where string values are the norm, because there is no way to represent an "empty integer" other than inserting 0 or 1, which will wind up being rolled up in any comparisons made later (which has its own implications)*.

当您注册一个缺少版本扩展名的包时,您知道它缺少一个扩展名,因此空字符串实际上是正确的值。只有当您不知道它是否有扩展名,或者您知道它有但不知道它是什么时,NULL 才是正确的。在字符串值是规范的系统中,这种情况更容易处理,因为除了插入 0 或 1 之外,没有办法表示“空整数”,这将在以后进行的任何比较中被汇总(它有它自己的含义)*。

Incidentally, both ways are valid in Postgres (since we're discussing "enterprise" RDMBSs), but comparison results can vary quite a bit when you throw a NULL into the mix -- because NULL == "don't know" so all results of a comparison involving a NULL wind up being NULL since you can't know something that is unknown. DANGER!Think carefully about that: this means that NULL comparison results propagatethrough a series of comparisons. This can be a source of subtle bugs when sorting, comparing, etc.

顺便说一下,这两种方式在 Postgres 中都是有效的(因为我们正在讨论“企业”RDMBS),但是当您将 NULL 放入混合中时,比较结果可能会有很大差异 - 因为 NULL == “不知道”所以所有涉及 NULL 的比较结果最终为 NULL,因为您无法知道未知的事物。危险!仔细考虑一下:这意味着 NULL 比较结果通过一系列比较传播。这可能是排序、比较等时细微错误的来源。

Postgres assumes you're an adult and can make this decision for yourself. Oracle and DB2 assume you didn't realize you were doing something silly and throw an error. This is usuallythe right thing, but not always -- you might actuallynot know and have a NULL in some cases and therefore leaving a row with an unknown element against which meaningful comparisons are impossible is correct behavior.

Postgres 假设您是成年人并且可以自己做出这个决定。Oracle 和 DB2 假设您没有意识到自己在做一些愚蠢的事情并抛出错误。这通常是正确的事情,但并非总是如此——在某些情况下,您实际上可能不知道并且有一个 NULL,因此留下一行包含未知元素的行,无法对其进行有意义的比较是正确的行为。

In any case you should strive to eliminate the number of NULL fields you permit across the entire schema and doubly so when it comes to fields that are part of a primary key. In the vast majority of cases the presence of NULL columns is an indication of un-normalized (as opposed to deliberately de-normalized) schema design and should be thought very hard about before being accepted.

在任何情况下,您都应该努力消除整个架构中允许的 NULL 字段的数量,并且在涉及作为主键一部分的字段时要加倍消除。在绝大多数情况下,NULL 列的存在表明未规范化(与故意反规范化相反)模式设计,在被接受之前应该仔细考虑。

[* NOTE: It is possible to create a custom type that is the union of integers and a "bottom" type that would semantically mean "empty" as opposed to "unknown". Unfortunately this introduces a bit of complexity in comparison operations and usually being truly type correct isn't worth the effort in practice as you shouldn't be permitted many NULLvalues at all in the first place. That said, it would be wonderful if RDBMSs would include a default BOTTOMtype in addition to NULLto prevent the habit of casually conflating the semantics of "no value" with "unknown value".]

[*注意:可以创建一个自定义类型,它是整数和“底部”类型的联合,语义上表示“空”而不是“未知”。不幸的是,这在比较操作中引入了一些复杂性,并且通常真正的类型正确在实践中是不值得的,因为您根本不应该被允许使用多个NULL值。也就是说,如果 RDBMS 包含一个默认BOTTOM类型,NULL以防止随意将“无值”的语义与“未知值”混为一谈的习惯,那就太好了。]

回答by Cogsy

NULL == NULL -> false (at least in DBMSs)

NULL == NULL -> false(至少在 DBMS 中)

So you wouldn't be able to retrieve any relationships using a NULL value even with additional columns with real values.

因此,即使使用具有实际值的附加列,您也无法使用 NULL 值检索任何关系。

回答by Rami Ojares

The answer by Tony Andrews is a decent one. But the real answer is that this has been a convention used by relational database community and is NOT a necessity. Maybe it is a good convention, maybe not.

托尼安德鲁斯的答案是一个不错的答案。但真正的答案是,这是关系数据库社区使用的约定,并不是必需的。也许这是一个很好的约定,也许不是。

Comparing anything to NULL results in UNKNOWN (3rd truth value). So as has been suggested with nulls all traditional wisdom concerning equality goes out the window. Well that's how it seems at first glance.

将任何内容与 NULL 进行比较会导致 UNKNOWN(第三个真值)。因此,正如 nulls 所建议的那样,所有关于平等的传统智慧都被排除在外。乍一看,这就是它的样子。

But I don't think this is necessarily so and even SQL databases don't think that NULL destroys all possibility for comparison.

但我认为这不一定是这样,甚至 SQL 数据库也不认为 N​​ULL 会破坏所有比较的可能性。

Run in your database the query SELECT * FROM VALUES(NULL) UNION SELECT * FROM VALUES(NULL)

在您的数据库中运行查询 SELECT * FROM VALUES(NULL) UNION SELECT * FROM VALUES(NULL)

What you see is just one tuple with one attribute that has the value NULL. So the union recognized here the two NULL values as equal.

您看到的只是一个元组,其中一个属性的值为 NULL。因此,联合在这里将两个 NULL 值视为相等。

When comparing a composite key that has 3 components to a tuple with 3 attributes (1, 3, NULL) = (1, 3, NULL) <=> 1 = 1 AND 3 = 3 AND NULL = NULL The result of this is UNKNOWN.

将具有 3 个组件的复合键与具有 3 个属性的元组进行比较时 (1, 3, NULL) = (1, 3, NULL) <=> 1 = 1 AND 3 = 3 AND NULL = NULL 结果是 UNKNOWN .

But we could define a new kind of comparison operator eg. ==. X == Y <=> X = Y OR (X IS NULL AND Y IS NULL)

但是我们可以定义一种新的比较运算符,例如。==。X == Y <=> X = Y 或(X 为空且 Y 为空)

Having this kind of equality operator would make composite keys with null components or non-composite key with null value unproblematic.

拥有这种相等运算符将使具有空组件的复合键或具有空值的非复合键没有问题。

回答by Adriaan Davel

I still believe this is a fundamental / functional flaw brought about by a technicality. If you have an optional field by which you can identify a customer you now have to hack a dummy value into it, just because NULL != NULL, not particularly elegant yet it is an "industry standard"

我仍然认为这是由技术性带来的基本/功能缺陷。如果你有一个可选字段,你可以通过它来识别一个客户,你现在必须在其中输入一个虚拟值,仅仅因为 NULL != NULL,虽然不是特别优雅,但它是“行业标准”