MySQL 具有 NULL 的唯一键
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4081783/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Unique key with NULLs
提问by Jason Swett
This question requires some hypothetical background. Let's consider an employee
table that has columns name
, date_of_birth
, title
, salary
, using MySQL as the RDBMS. Since if any given person has the same name and birth date as another person, they are, by definition, the same person (barring amazing coincidences where we have two people named Abraham Lincoln born on February 12, 1809), we'll put a unique key on name
and date_of_birth
that means "don't store the same person twice." Now consider this data:
这个问题需要一些假设背景。让我们考虑一个employee
包含列name
, date_of_birth
, title
,的表salary
,使用 MySQL 作为 RDBMS。因为如果任何给定的人的姓名和出生日期与另一个人相同,那么根据定义,他们就是同一个人(除非我们有两个名叫亚伯拉罕·林肯的人出生于 1809 年 2 月 12 日的惊人巧合),我们将放置一个唯一键name
,date_of_birth
这意味着“不要存储同一个人两次”。现在考虑这个数据:
id name date_of_birth title salary
1 John Smith 1960-10-02 President 500,000
2 Jane Doe 1982-05-05 Accountant 80,000
3 Jim Johnson NULL Office Manager 40,000
4 Tim Smith 1899-04-11 Janitor 95,000
If I now try to run the following statement, it should and will fail:
如果我现在尝试运行以下语句,它应该并且将会失败:
INSERT INTO employee (name, date_of_birth, title, salary)
VALUES ('Tim Smith', '1899-04-11', 'Janitor', '95,000')
If I try this one, it will succeed:
如果我尝试这个,它会成功:
INSERT INTO employee (name, title, salary)
VALUES ('Jim Johnson', 'Office Manager', '40,000')
And now my data will look like this:
现在我的数据将如下所示:
id name date_of_birth title salary
1 John Smith 1960-10-02 President 500,000
2 Jane Doe 1982-05-05 Accountant 80,000
3 Jim Johnson NULL Office Manager 40,000
4 Tim Smith 1899-04-11 Janitor 95,000
5 Jim Johnson NULL Office Manager 40,000
This is not what I want but I can't say I entirely disagree with what happened. If we talk in terms of mathematical sets,
这不是我想要的,但我不能说我完全不同意发生的事情。如果我们谈论数学集合,
{'Tim Smith', '1899-04-11'} = {'Tim Smith', '1899-04-11'} <-- TRUE
{'Tim Smith', '1899-04-11'} = {'Jane Doe', '1982-05-05'} <-- FALSE
{'Tim Smith', '1899-04-11'} = {'Jim Johnson', NULL} <-- UNKNOWN
{'Jim Johnson', NULL} = {'Jim Johnson', NULL} <-- UNKNOWN
My guess is that MySQL says, "Since I don't knowthat Jim Johnson with a NULL
birth date isn't already in this table, I'll add him."
我的猜测是 MySQL 会说,“因为我不知道NULL
这个表中没有有出生日期的Jim Johnson ,所以我会添加他。”
My question is: How can I prevent duplicates even though date_of_birth
is not always known?The best I've come up with so far is to move date_of_birth
to a different table. The problem with that, however, is that I might end up with, say, two cashiers with the same name, title and salary, different birth dates and no way to store them both without having duplicates.
我的问题是:即使date_of_birth
不总是已知,我怎样才能防止重复?到目前为止,我想出的最好的方法是移到date_of_birth
另一张桌子上。然而,这样做的问题是,我可能最终会遇到两个具有相同姓名、头衔和薪水、不同出生日期的收银员,并且无法在没有重复的情况下存储它们。
采纳答案by NealB
A fundamental property of a unique keyis that it must be unique. Making part of that key Nullable destroys this property.
唯一键的一个基本属性是它必须是唯一的。使该键成为 Nullable 的一部分会破坏此属性。
There are two possible solutions to your problem:
您的问题有两种可能的解决方案:
One way, the wrong way, would be to use some magic date to represent unknown. This just gets you past the DBMS "problem" but does not solve the problem in a logical sense. Expect problems with two "John Smith" entries having unknown dates of birth. Are these guys one and the same or are they unique individuals? If you know they are different then you are back to the same old problem - your Unique Key just isn't unique. Don't even think about assigning a whole range of magic dates to represent "unknown" - this is truly the road to hell.
A better way is to create an EmployeeId attribute as a surrogate key. This is just an arbitrary identifier that you assign to individuals that you knoware unique. This identifier is often just an integer value. Then create an Employee table to relate the EmployeeId (unique, non-nullable key) to what you believe are the dependant attributers, in this case Name and Date of Birth (any of which may be nullable). Use the EmployeeId surrogate key everywhere that you previously used the Name/Date-of-Birth. This adds a new table to your system but solves the problem of unknown values in a robust manner.
一种错误的方法是使用一些神奇的日期来表示未知。这只会让您解决 DBMS“问题”,但并不能解决逻辑意义上的问题。预计两个“约翰史密斯”条目的出生日期未知的问题。这些人是同一个人还是独一无二的个体?如果你知道它们是不同的,那么你又回到了同样的老问题——你的唯一键不是唯一的。甚至不要考虑分配整个范围的魔法日期来代表“未知”——这真的是通往地狱的道路。
更好的方法是创建一个 EmployeeId 属性作为代理键。这只是您分配给您知道是唯一的个人的任意标识符。这个标识符通常只是一个整数值。然后创建一个 Employee 表,将 EmployeeId(唯一的、不可为空的键)与您认为的相关属性相关联,在本例中为姓名和出生日期(其中任何一个都可以为空)。在您之前使用姓名/出生日期的任何地方使用 EmployeeId 代理键。这会向您的系统添加一个新表,但以稳健的方式解决了未知值的问题。
回答by Mark Byers
I think MySQL does it right here. Some other databases (for example Microsoft SQL Server) treat NULL as a value that can only be inserted once into a UNIQUE column, but personally I find this to be strange and unexpected behaviour.
我认为 MySQL 就在这里做到了。其他一些数据库(例如 Microsoft SQL Server)将 NULL 视为只能插入一次 UNIQUE 列的值,但我个人认为这是奇怪且意外的行为。
However since this is what you want, you can use some "magic" value instead of NULL, such as a date a long time in the past
但是,由于这是您想要的,您可以使用一些“魔术”值而不是 NULL,例如过去很长时间的日期
回答by HLGEM
Your problem of not having duplicates based on name is not solvable because you do not have a natural key. Putting a fake date in for people whose date of birth is unknown will not solve your problem. John Smith born 1900/01/01 is still going to be a differnt person than John Smithh born 1960/03/09.
您没有基于名称的重复项的问题无法解决,因为您没有自然键。为出生日期未知的人设置假日期并不能解决您的问题。1900 年 1 月 1 日出生的约翰·史密斯与 1960 年 3 月 9 日出生的约翰·史密斯仍然是一个不同的人。
I work with name data from large and small organizations every day and I can assure you they have two different people with the same name all the time. Sometimes with the same job title. Birthdate is no guarantee of uniqueness either, plenty of John Smiths born on the same date. Heck when we work with physicians office data we have often have two doctors with the same name, address and phone number (father and son combinations)
我每天都在处理来自大型和小型组织的姓名数据,我可以向您保证,他们始终有两个同名的不同人员。有时具有相同的职称。出生日期也不能保证唯一性,很多约翰·史密斯都是在同一天出生的。哎呀,当我们使用医生办公室数据时,我们经常有两个名字、地址和电话号码相同的医生(父子组合)
Your best bet is to have an employee ID if you are inserting employee data to identify each employee uniquely. Then check for the uniquename in the user interface and if there are one or more matches, ask the user if he meant them and if he says no, insert the record. Then build a deupping process to fix problems if someone gets assigned two ids by accident.
如果您要插入员工数据以唯一标识每个员工,那么最好的办法是拥有一个员工 ID。然后检查用户界面中的唯一名称,如果有一个或多个匹配项,询问用户是否是这个意思,如果他说不是,则插入记录。如果有人不小心被分配了两个 id,那么建立一个 deupping 过程来解决问题。
回答by Alexander Yancharuk
I recommend to create additional table column checksum
which will contain md5 hash of name
and date_of_birth
. Drop unique key (name, date_of_birth)
because it doesn't solve the problem. Create one unique key on checksum.
我建议创建额外的表列checksum
其中将包含MD5哈希name
和date_of_birth
。删除唯一键,(name, date_of_birth)
因为它不能解决问题。在校验和上创建一个唯一的密钥。
ALTER TABLE employee
ADD COLUMN checksum CHAR(32) NOT NULL;
UPDATE employee
SET checksum = MD5(CONCAT(name, IFNULL(date_of_birth, '')));
ALTER TABLE employee
ADD UNIQUE (checksum);
This solution creates small technical overhead, cause for every inserted pairs you need to generate hash (same thing for every search query). For further improvements you can add trigger that will generate hash for you in every insert:
该解决方案会产生很小的技术开销,导致您需要为每个插入的对生成哈希(每个搜索查询都是如此)。为了进一步改进,您可以添加触发器,在每次插入时为您生成哈希:
CREATE TRIGGER before_insert_employee
BEFORE INSERT ON employee
FOR EACH ROW
IF new.checksum IS NULL THEN
SET new.checksum = MD5(CONCAT(new.name, IFNULL(new.date_of_birth, '')));
END IF;
回答by Mike Lue
There is a another way to do it. Adding a column(non-nullable) to represent the String value of date_of_birth column. The new column value would be ""(empty string) if date_of_birth is null.
还有另一种方法可以做到。添加一列(不可为空)来表示 date_of_birth 列的 String 值。如果 date_of_birth 为空,则新列值将为 ""(空字符串)。
We name the column as date_of_birth_strand create a unique constraint employee(name, date_of_birth_str). So when two recoreds come with the same name and null date_of_birth value, the unique constraint still works.
我们将该列命名为date_of_birth_str并创建一个唯一的约束雇员(name, date_of_birth_str)。因此,当两个 recored 具有相同的名称和 null date_of_birth 值时,唯一约束仍然有效。
But the efforts of maintenance for the two same-meaning columns, and, the performance harm of new column, should be considered carefully.
但对两个同义列的维护力度,以及新列的性能危害,则应慎重考虑。
回答by romor
You can add a generated column where the NULL
value is replaced by an unused constant, e.g. zero. Then you can apply the unique constraint to this column:
您可以添加一个生成的列,其中的NULL
值被一个未使用的常量替换,例如零。然后您可以将唯一约束应用于此列:
CREATE TABLE employee (
name VARCHAR(50) NOT NULL,
date_of_birth DATE,
uq_date_of_birth DATE AS (IFNULL(date_of_birth, '0000-00-00')) UNIQUE
);
回答by kingledion
I had a similar problem to this, but with a twist. In your case, every employee has a birthday, although it may be unknown. In that case, it makes logical sense for the system to assign two values for employees with unknown birthdays but otherwise identical information. NealB's accepted answer is very accurate.
我有一个类似的问题,但有一个转折。在您的情况下,每个员工都有一个生日,尽管它可能是未知的。在这种情况下,系统为生日未知但信息相同的员工分配两个值是合乎逻辑的。NealB 接受的答案非常准确。
However, the problem I encountered was one in which the data field did not necessarily have a value. For example, if you added a 'name_of_spouse' field to your table, there wouldn't necessarily be a value for each row of the table. In that case, NealB's first bullet point (the 'wrong way') actually makes sense. In this case, a string 'None' should be inserted in the column name_of_spouse for each row in which there was no known spouse.
但是,我遇到的问题是数据字段不一定有值。例如,如果您向表中添加了“name_of_spouse”字段,则表的每一行不一定都有一个值。在那种情况下,NealB 的第一个要点(“错误的方式”)实际上是有道理的。在这种情况下,应该在 name_of_spouse 列中为没有已知配偶的每一行插入一个字符串“None”。
The situation where I ran into this problem was in writing a program with database to classify IP traffic. The goal was to create a graph of IP traffic on a private network. Each packet was put into a database table with a unique connection index based on its ip source and dest, port source and dest, transport protocol, and application protocol. However, many packets simply don't have an application protocol. For example, all TCP packets without an application protocol should be classed together, and should occupy one unique entry in the connections index. This is because I want those packets to form a single edge of my graph. In this situation, I took my own advice from above, and stored a string 'None' in the application protocol field to ensure that these packets formed a unique group.
我遇到这个问题的情况是在编写一个带有数据库的程序来对 IP 流量进行分类。目标是在专用网络上创建 IP 流量图。每个数据包都被放入一个具有唯一连接索引的数据库表中,该索引基于其 ip 源和目标、端口源和目标、传输协议和应用协议。但是,许多数据包根本没有应用程序协议。例如,所有没有应用协议的 TCP 数据包都应该归为一类,并且应该在连接索引中占据一个唯一的条目。这是因为我希望这些数据包形成图形的单个边。在这种情况下,我根据上面的建议,在应用程序协议字段中存储了一个字符串“None”,以确保这些数据包形成一个唯一的组。
回答by Paul
The perfect solution would be support for function based UK's, but that becomes more complex as mySQL would also then need to support function based indexes. This would prevent the need to use "fake" values in place of NULL, while also allowing developers the ability to decide how to treat NULL values in UK's. Unfortunately, mySQL doesn't currently support such functionality that I am aware of, so we're left with workarounds.
完美的解决方案是支持基于函数的 UK,但这变得更加复杂,因为 mySQL 还需要支持基于函数的索引。这将防止需要使用“假”值代替 NULL,同时还允许开发人员决定如何处理 UK 中的 NULL 值。不幸的是,mySQL 目前不支持我所知道的此类功能,因此我们还有其他解决方法。
CREATE TABLE employee(
name CHAR(50) NOT NULL,
date_of_birth DATE,
title CHAR(50),
UNIQUE KEY idx_name_dob (name, IFNULL(date_of_birth,'0000-00-00 00:00:00'))
);
(Note the use of the IFNULL()function in the unique key definition)
(注意唯一键定义中IFNULL()函数的使用)
回答by Lordferrous
In simple words,the role of Unique constraintis to make the field or column. The nulldestroys this property as database treats null as unknown
简单来说,唯一约束的作用就是使字段或列。该空数据库将NULL作为破坏这个属性未知
Inorder to avoid duplicates and allow null:
为了避免重复并允许为空:
Make unique key as Primary key
将唯一键设为主键