Python NaN 和 None 有什么区别?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17534106/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:27:28  来源:igfitidea点击:

What is the difference between NaN and None?

pythonnumpypandasnan

提问by user1083734

I am reading two columns of a csv file using pandas readcsv()and then assigning the values to a dictionary. The columns contain strings of numbers and letters. Occasionally there are cases where a cell is empty. In my opinion, the value read to that dictionary entry should be Nonebut instead nanis assigned. Surely Noneis more descriptive of an empty cell as it has a null value, whereas nanjust says that the value read is not a number.

我正在使用 Pandas 读取 csv 文件的两列,readcsv()然后将值分配给字典。列包含数字和字母字符串。偶尔会出现单元格为空的情况。在我看来,读取到该字典条目的值应该是None而是nan被分配的。当然None更能描述空单元格,因为它有一个空值,而nan只是说读取的值不是数字。

Is my understanding correct, what IS the difference between Noneand nan? Why is nanassigned instead of None?

我的理解是否正确,None和之间有什么区别nan?为什么是nan赋值而不是None

Also, my dictionary check for any empty cells has been using numpy.isnan():

此外,我的字典检查任何空单元格一直在使用numpy.isnan()

for k, v in my_dict.iteritems():
    if np.isnan(v):

But this gives me an error saying that I cannot use this check for v. I guess it is because an integer or float variable, not a string is meant to be used. If this is true, how can I check vfor an "empty cell"/nancase?

但这给了我一个错误,说我不能使用这个检查v. 我想这是因为要使用整数或浮点变量,而不是字符串。如果这是真的,我如何检查v“空单元格”/nan案例?

采纳答案by Andy Hayden

NaN is used as a placeholder for missing data consistentlyin pandas, consistency is good. I usually read/translate NaN as "missing". Also see the 'working with missing data'section in the docs.

NaN被用作占位符,丢失的数据一致的大熊猫,一致性好等特点。我通常阅读/翻译 NaN 为"missing"另请参阅文档中的“处理缺失数据”部分。

Wes writes in the docs 'choice of NA-representation':

Wes 在文档“选择 NA 表示”中写道:

After years of production use [NaN] has proven, at least in my opinion, to be the best decision given the state of affairs in NumPy and Python in general. The special value NaN (Not-A-Number) is used everywhereas the NA value, and there are API functions isnulland notnullwhich can be used across the dtypes to detect NA values.
...
Thus, I have chosen the Pythonic “practicality beats purity” approach and traded integer NA capability for a much simpler approach of using a special value in float and object arrays to denote NA, and promoting integer arrays to floating when NAs must be introduced.

经过多年的生产使用,[NaN] 已经证明,至少在我看来,考虑到 NumPy 和 Python 的总体情况,这是最好的决定。特殊值NaN(非-A-数)被用作到处作为NA值,并有API函数isnullnotnull可跨越dtypes被用于检测NA的值。
...
因此,我选择了 Pythonic 的“实用性胜过纯度”的方法,并用整数 NA 的能力换来了一种更简单的方法,即在浮点数和对象数组中使用特殊值来表示 NA,并在必须使用 NA 时将整数数组提升为浮点数介绍。

Note: the "gotcha" that integer Series containing missing data are upcast to floats.

注意:包含缺失数据的整数系列被向上转换为 floats“问题”

In my opinion the main reason to use NaN (over None) is that it can be stored with numpy's float64 dtype, rather than the less efficient object dtype, see NA type promotions.

在我看来,使用 NaN(而不是 None)的主要原因是它可以与 numpy 的 float64 dtype 一起存储,而不是效率较低的对象 dtype,请参阅NA 类型促销

#  without forcing dtype it changes None to NaN!
s_bad = pd.Series([1, None], dtype=object)
s_good = pd.Series([1, np.nan])

In [13]: s_bad.dtype
Out[13]: dtype('O')

In [14]: s_good.dtype
Out[14]: dtype('float64')

Jeff comments (below) on this:

杰夫对此评论(如下):

np.nanallows for vectorized operations; its a float value, while None, by definition, forces object type, which basically disables all efficiency in numpy.

So repeat 3 times fast: object==bad, float==good

np.nan允许向量化操作;它是一个浮点值,而None根据定义,它强制对象类型,这基本上禁用了 numpy 中的所有效率。

所以快速重复 3 次:object==bad,float==good

Saying that, many operations may still work just as well with None vs NaN (but perhaps are not supported i.e. they may sometimes give surprising results):

也就是说,许多操作可能仍然适用于 None 与 NaN(但可能不受支持,即它们有时可能会给出令人惊讶的结果):

In [15]: s_bad.sum()
Out[15]: 1

In [16]: s_good.sum()
Out[16]: 1.0

To answer the second question:
You should be using pd.isnulland pd.notnullto test for missing data (NaN).

回答第二个问题:
您应该使用pd.isnullpd.notnull测试缺失数据 (NaN)。

回答by diegoaguilar

NaNstants for NOT a number.
Nonemight stand for any.

NaN常量不是一个数字
None可能代表任何

回答by Stephan

The function isnan()checks to see if something is "Not A Number" and will return whether or not a variable is a number, for example isnan(2)would return false

该函数会isnan()检查某些内容是否为“非数字”,并将返回变量是否为数字,例如isnan(2)返回 false

The conditional myVar is not Nonereturns whether or not the variable is defined

条件myVar is not None返回是否定义了变量

Your numpy array uses isnan()because it is intended to be an array of numbers and it initializes all elements of the array to NaNthese elements are considered "empty"

您使用 numpy 数组isnan()是因为它旨在成为一个数字数组,并且它将数组的所有元素初始化为NaN这些元素被视为“空”

回答by heltonbiker

NaNcan be used as a numerical value on mathematical operations, while Nonecannot (or at least shouldn't).

NaN可以用作数学运算的数值,而None不能(或至少不应该)。

NaNis a numeric value, as defined in IEEE 754 floating-point standard. Noneis an internal Python type (NoneType) and would be more like "inexistent" or "empty" than "numerically invalid" in this context.

NaN是一个数值,如IEEE 754 浮点标准中所定义。 None是一个内部 Python 类型 ( NoneType) 并且在这种情况下更像是“不存在”或“空”而不是“数字无效”。

The main "symptom" of that is that, if you perform, say, an average or a sum on an array containing NaN, even a single one, you get NaN as a result...

其主要的“症状”是,如果您对包含 NaN 的数组(即使是单个数组)执行平均值或求和,结果会得到 NaN ......

In the other hand, you cannot perform mathematical operations using Noneas operand.

另一方面,您不能使用Noneas 操作数执行数学运算。

So, depending on the case, you could use Noneas a way to tell your algorithm not to consider invalid or inexistent values on computations. That would mean the algorithm should test each value to see if it is None.

因此,根据情况,您可以使用None一种方法来告诉您的算法不要在计算中考虑无效或不存在的值。这意味着算法应该测试每个值以查看它是否为None

Numpy has some functions to avoid NaN values to contaminate your results, such as nansumand nan_to_numfor example.

Numpy 有一些功能可以避免 NaN 值污染您的结果,例如nansumnan_to_num例如。

回答by eswara amirthan s

Below are the differences:

以下是差异:

  • nanbelongs to the class float
  • Nonebelongs to the class NoneType
  • nan属于班级 float
  • None属于班级 NoneType

I found the below article very helpful: https://medium.com/analytics-vidhya/dealing-with-missing-values-nan-and-none-in-python-6fc9b8fb4f31

我发现以下文章非常有帮助:https: //medium.com/analytics-vidhya/dealing-with-missing-values-nan-and-none-in-python-6fc9b8fb4f31