何时在 MySQL 表中使用 NULL
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/471367/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
When to use NULL in MySQL tables
提问by DavidWinterbottom
I appreciate the semantic meaning of a NULL value in a database table, different from both false and the empty string ''. However, I have often read about performance problems when fields are nullable and been advised to use an empty string in cases where NULL is actually semantically correct.
我很欣赏数据库表中 NULL 值的语义,它不同于 false 和空字符串 ''。但是,我经常阅读有关字段可为空时的性能问题,并被建议在 NULL 实际上语义正确的情况下使用空字符串。
What circumstances are appropriate to use nullable fields and NULL values? What are the trade-offs? Is it sensible to simply avoid using NULLs altogether and simply use empty strings, false or 0 to indicate the absence of a value?
什么情况下适合使用可空字段和空值?有哪些取舍?完全避免使用 NULL 并简单地使用空字符串、false 或 0 来表示没有值是否明智?
UPDATE
更新
OK - I understand the semantic difference between '' and NULL as well as the (performance-agnostic) circumstances in which NULL is the appropriate field value. However, let me expand on the hinted performance issue. This is from the excellent "High Performance MySQL" by Schwartz, Zeitsev et al http://www.borders.co.uk/book/high-performance-mysql-optimization-backups-replication-and-more/857673/:
好的 - 我理解 '' 和 NULL 之间的语义差异以及 NULL 是适当字段值的(与性能无关的)情况。但是,让我扩展一下暗示的性能问题。这是来自 Schwartz、Zeitsev 等人的优秀“高性能 MySQL” http://www.borders.co.uk/book/high-performance-mysql-optimization-backups-replication-and-more/857673/:
It's harder for MySQL to optimize queries that refer to nullable coumns, because they make indexes, index statistics, and value comparisons more complicated. A nullable column uses more storage space and requires special processing inside MySQL. When a nullable column is indexed, it requires an extra byte per entry and can even cause a fixed-size inded (such as an index on a single integer column) to be converted to a variable-sized one in MyISAM.
MySQL 更难优化引用可空参数的查询,因为它们使索引、索引统计和值比较变得更加复杂。可为空的列使用更多的存储空间并且需要在 MySQL 内部进行特殊处理。当一个可空列被索引时,它需要每个条目一个额外的字节,甚至可能导致固定大小的索引(例如单个整数列上的索引)在 MyISAM 中转换为可变大小的索引。
More here: Google books preview
更多信息: 谷歌图书预览
This is quite possibly the definitive answer - I was just looking for second opinions and experience from the front-line.
这很可能是确定的答案 - 我只是在寻找一线的第二意见和经验。
回答by Bill Karwin
However, I have often read about performance problems when fields are nullable and been advised to use an empty string in cases where NULL is actually semantically correct.
但是,我经常阅读有关字段可为空时的性能问题,并被建议在 NULL 实际上语义正确的情况下使用空字符串。
I'm going to be nit-picky about word choice for a moment:
我会挑剔一下词的选择:
- Even if it were a significant performance factor, that doesn't make it semanticallycorrect to use a value instead of NULL. In SQL, NULL has a semantic role, to denote a missing or inapplicable value. The performance characteristics of NULL in a given RDBMS implementation are independent of this. The performance may vary from brand to brand or from version to version, but the purpose of NULL in the language is consistent.
- 即使它是一个重要的性能因素,也不会使使用值而不是 NULL在语义上正确。在 SQL 中,NULL 具有语义作用,表示缺失或不适用的值。给定 RDBMS 实现中 NULL 的性能特征与此无关。性能可能因品牌而异或因版本而异,但语言中NULL的目的是一致的。
In any case, I have not heard of any evidence that NULL performs poorly. I'd be interested in any references to performance measurements that show nullable columns perform worse than non-nullable columns.
在任何情况下,我都没有听说过任何证据表明 NULL 表现不佳。我会对任何对性能测量的引用感兴趣,这些参考表明可空列的性能比不可空列的性能差。
I'm not saying I'm not wrong or that it can't be true in some cases -- just that it's not meaningful to make idle suppositions. Science is not made up of conjecture; one has to show evidence with repeatable measurements.
我并不是说我没有错,或者在某些情况下它不可能是真的——只是做出无意义的假设是没有意义的。科学不是由猜想组成的;人们必须用可重复的测量来证明证据。
Metrics also tell you by how muchthe performance differs, so you can make a judgment about whether it's something to worth worrying about. That is, the impact could be measurable and nonzero, but still insignificant compared to greater performance factors, such as properly indexing tables or sizing your database cache.
指标还会告诉您性能差异有多大,因此您可以判断是否值得担心。也就是说,影响可能是可衡量的且非零的,但与更高的性能因素(例如正确索引表或调整数据库缓存大小)相比,仍然微不足道。
In MySQL, searches for NULL can benefit from an index:
在 MySQL 中,搜索 NULL 可以从索引中受益:
mysql> CREATE TABLE foo (
i INT NOT NULL,
j INT DEFAULT NULL,
PRIMARY KEY (i),
UNIQUE KEY j_index (j)
);
mysql> INSERT INTO foo (i, j) VALUES
(1, 1), (2, 2), (3, NULL), (4, NULL), (5, 5);
mysql> EXPLAIN SELECT * FROM foo WHERE i = 3;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | foo | const | PRIMARY | PRIMARY | 4 | const | 1 | |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
mysql> EXPLAIN SELECT * FROM foo WHERE j IS NULL;
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | foo | ref | j_index | j_index | 5 | const | 2 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
Note that's still not a measurement of performance. I've only shown that you can use an index while searching for NULL. I'm going to assert (admittedly without having measured, but hey this is just StackOverflow) that the benefit of an index overshadows any possible penalty when searching for NULL versus a blank string.
请注意,这仍然不是性能的衡量标准。我只展示了您可以在搜索 NULL 时使用索引。我要断言(诚然没有测量,但嘿,这只是 StackOverflow)索引的好处掩盖了搜索 NULL 与空白字符串时任何可能的惩罚。
It's not a correct design decision to choose zero or blank or any other value to substitute for NULL. You may need to use those values as significant in the column. That's why NULL exists, as a value that is by definition outside the domain of values of any data type, so you can use the full range of values of integers or strings or whatever and still have something to signify "none of the above values."
选择零或空白或任何其他值来替代 NULL 并不是一个正确的设计决定。您可能需要在列中使用这些值作为重要值。这就是为什么 NULL 存在的原因,作为根据定义在任何数据类型的值域之外的值,因此您可以使用整数或字符串或其他任何值的完整范围,并且仍然有一些东西可以表示“上述值都不是。 ”
回答by ólafur Waage
The MySQL manual actually has a nice article about the problems with NULL.
Hope it helps.
希望能帮助到你。
Also found this other SO post about NULL and Performance
还发现了关于 NULL 和性能的其他SO 帖子
回答by Kezzer
We don't allow NULL values in our databases unless it's for numeric values, or for dates. The reason why we do this is because numeric values sometimes should not be defaulted to zero as this is very, very bad. I'm a developer for a stock brokers and there's a big, big difference between NULLand 0. The use of COALESCE comes in handy if we do want to default values back to zero even though we don't store them as such.
我们不允许在我们的数据库中使用 NULL 值,除非它用于数值或日期。我们这样做的原因是因为数值有时不应该默认为零,因为这非常非常糟糕。我是股票经纪人的开发人员,NULL和0之间有很大的区别。如果我们确实希望将默认值恢复为零,即使我们不这样存储它们,COALESCE 的使用也会派上用场。
MyVal = COALESCE(TheData, 0)
As we do bulk inserts of data from flat files we use format files to determine the entry of the data which automagically converts empty values into blank strings anyway.
当我们从平面文件中批量插入数据时,我们使用格式文件来确定数据的条目,无论如何都会自动将空值转换为空字符串。
Dates default to whatever value may appear dependant on the collation I believe, but ours default to something like 1900, and again, dates are extremely important. Other plain text values aren't so important, and if left blank typically qualify as okay.
日期默认为任何可能出现的值,这取决于我相信的排序规则,但我们的默认值是 1900 年之类的值,再次重申,日期非常重要。其他纯文本值不是那么重要,如果留空通常也可以。
回答by Jim Anderson
Generally, if an attribute is required, it is defined as Not NULL and if it may be omitted it is defined as nullable.
通常,如果需要某个属性,则将其定义为 Not NULL,如果可以省略,则将其定义为可空。
回答by ForYourOwnGood
The empty string should not be used in place of NULL
. NULL
represents nothing where as the empty string is something, with nothing inside. NULL
will always be false when compared to another value (even NULL
) and NULL
will not be summed in the COUNT
function.
不应使用空字符串代替NULL
。NULL
不代表任何东西,因为空字符串是东西,里面什么都没有。NULL
与另一个值(偶数NULL
)相比时将始终为假,并且NULL
不会在COUNT
函数中求和。
If you need to represent unknown information there is no substitute to NULL
.
如果您需要表示未知信息,则无法替代NULL
。
回答by user1105491
As @ForYourOwnGood said - Null should be used for "unknown" information. For example: If you have a lot of fields that customer should fill-in on registration and some of them are optional. For some reason you might want to reserve an ID for that particular customer and since you don't know if the optional fields are a real choice by the customer to be left empty you should set them as NULL i.e. "unknown" when you first save the row. If the customer submits the form, passes all your validation and then you save the information, then you know for fact that the optional field is left empty by intention.
正如@ForYourOwnGood 所说 - Null 应该用于“未知”信息。例如:如果您有很多字段需要客户在注册时填写,其中一些是可选的。出于某种原因,您可能希望为该特定客户保留一个 ID,并且由于您不知道可选字段是否是客户真正选择将其留空,因此您应该在第一次保存时将它们设置为 NULL,即“未知”行。如果客户提交表单,通过您的所有验证,然后您保存信息,那么您就知道可选字段是有意留空的。
This is just a good case of using NULL.
这只是使用 NULL 的一个很好的例子。
回答by Chuck Cochems
I understand that there are times when MySQL's NULL semantics are entirely appropriate.
我知道有时候 MySQL 的 NULL 语义是完全合适的。
That said, they do seriously get in the way, particularly with text fields.
也就是说,它们确实很碍事,尤其是在文本字段中。
Here's a real world example.
这是一个真实世界的例子。
We wish to copy data from a FileMaker database into a mysql table.
我们希望将 FileMaker 数据库中的数据复制到 mysql 表中。
if we do "SELECT * from table where textfield <> 'test'", rows that have textfield of NULL will NOT be returned. This is most likely not what you expected or desired.
如果我们执行“SELECT * from table where textfield <> 'test'”,文本字段为 NULL 的行将不会被返回。这很可能不是您所期望或想要的。
if a field that is nullable is used in a where query, either alone or as part of an AND, entries that are NULL will NEVER be returned unless the IS NULL test used. We must do something like "where ((textfield<>"test") OR (textfield IS NOT NULL))" which is ugly at best.
如果在 where 查询中使用可空字段,无论是单独使用还是作为 AND 的一部分,除非使用 IS NULL 测试,否则永远不会返回 NULL 条目。我们必须做诸如“where ((textfield<>"test") OR (textfield IS NOT NULL))”之类的事情,这充其量是丑陋的。
So in this case, we probably do not want the field nullable.
所以在这种情况下,我们可能不希望该字段可以为空。
The problem here is that you CANNOT insert an empty string into MySQL using Filemaker. It gets converted to NULL, which errors out if you made the column not null! If you do allow NULL, then the transfer to mysql works, but then your not queries fail to act the way you want!
这里的问题是你不能使用 Filemaker 在 MySQL 中插入一个空字符串。它会转换为 NULL,如果您将列设置为非空,则会出错!如果您确实允许 NULL,那么转移到 mysql 就可以了,但是您的 not 查询无法按照您想要的方式行事!
the workaroud is to alter table twice, to convert the existing nulls after the import to empty string, then alter the table back to once again allow null. youch!
解决方法是更改表两次,将导入后的现有空值转换为空字符串,然后将表更改回再次允许空值。你哟!
damn filemaker.
该死的文件制作者。
回答by SquareCog
The main benefit, of course, is the semantic meaning of NULL, which you mentioned.
当然,主要的好处是您提到的 NULL 的语义含义。
In addition to that -- and it may depend on your storage engine, as always, check the documentation -- but in at least some databases, NULLs take up a lot less room than a regular value. For example, if you have a "varchar" column declared to be 20 characters, and it's rarely filled in, you can save a lot of disk space by making it NULL instead of an empty string.
除此之外——它可能取决于你的存储引擎,一如既往,检查文档——但至少在某些数据库中,NULL 占用的空间比常规值少得多。例如,如果您有一个声明为 20 个字符的“varchar”列,并且很少被填充,您可以通过将其设为 NULL 而不是空字符串来节省大量磁盘空间。
I have never heard of any performance issues with using NULLs, one the opposite. I've heard of people mucking up their counts because they counted NULLs wrong, but never performance. If that's a real thing, I would love to hear about it!
我从未听说过使用 NULL 有任何性能问题,恰恰相反。我听说有人因为计算 NULL 错误而搞砸了他们的计数,但从未计算过性能。如果这是真的,我很想听听!
回答by pilif
The meaning of a NULL column is more or less "doesn't apply in this context". I generally use NULL columns in two cases:
NULL 列的含义或多或少“不适用于此上下文”。我通常在两种情况下使用 NULL 列:
- If the field doesn't apply (let's say you have a boolean column is_thirsty and you add two datasets. One human and a stone. In case of the human, you set is_thirsty to either true or false, whereas in the case of the stone, you'd probably set it to NULL.
- If I need to flag something and store some data with the value. Like an inventory close date, which you'd use to a) specify that the inventory cannot be changed any more and b) to specify when the inventory was closed. Instead of having two columns (
closed_at
andis_closed
), I just create the closed_at column and set it to NULL if the inventory set can still be changed, but set the date once it's closed.
- 如果该字段不适用(假设您有一个布尔列 is_thirsty 并添加了两个数据集。一个人和一块石头。如果是人,则将 is_thirsty 设置为 true 或 false,而在石头的情况下,您可能会将其设置为 NULL。
- 如果我需要标记某些内容并使用该值存储一些数据。就像库存关闭日期一样,您可以使用它 a) 指定不能再更改库存和 b) 指定库存关闭的时间。而不是有两列(
closed_at
和is_closed
),我只是创建了 closed_at 列并将其设置为 NULL 如果库存集仍然可以更改,但一旦关闭就设置日期。
Basically it boils down to the fact that I use NULL when the emptyness of a field has a different unique semantic than just an empty field. The absence of a middle initial is just that. The absence of a closing date has the meaning of the inventory set still being open to changes.
基本上它归结为这样一个事实,即当字段的空性具有与空字段不同的唯一语义时,我使用 NULL。没有中间名首字母就是这样。没有关闭日期意味着库存集仍然可以更改。
NULL values can have nasty side effects and they will make life harder for you to add data to the table and more often than not, you can end up with a mish-mash of NULL values and empty strings for example.
NULL 值可能会产生令人讨厌的副作用,它们会使您更难将数据添加到表中,而且通常情况下,您最终可能会得到 NULL 值和空字符串的混杂。
Also, NULL is not equal to anything, which will screw queries all over the place if you are not very careful.
此外,NULL 不等于任何东西,如果您不非常小心,这会使查询到处都是。
Personally, I use NULL columns only when one of the above two cases applies. I never use it to signify empty fields when the emptyness has no meaning other than the absence of a value.
就个人而言,我仅在上述两种情况之一适用时才使用 NULL 列。当空性除了缺少值外没有任何意义时,我从不使用它来表示空字段。
回答by dkretz
Any self-respecting database engine these days should offer no penalty for properly using NULLs, unless your query is not designed correctly (which is usually not a problem you'll have very often with regard to NULLs).
如今,任何自尊的数据库引擎都不会因正确使用 NULL 而受到惩罚,除非您的查询设计不正确(这通常不是您经常遇到的关于 NULL 的问题)。
You should pay first attention to using the database (including NULLs) as intended; then worry about the optimizatin consequences when and if they occur.
您应该首先注意按预期使用数据库(包括 NULL);然后担心优化结果何时以及如果它们发生。
The cumulative effect of improperly NULLed column values in both SQL complexity and accuracy will almost surely outweigh the benefits of fooling with Mother DBMS. Besides, it will mess up your head, as well as that of anyone later who tries to figure out what you were trying to do.
不正确的 NULL 列值在 SQL 复杂性和准确性方面的累积影响几乎肯定会超过愚弄母 DBMS 的好处。此外,它会弄乱你的头脑,以及后来试图弄清楚你想要做什么的任何人的头脑。