Python NaN 和 None 有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17534106/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the difference between NaN and None?
提问by user1083734
I am reading two columns of a csv file using pandas readcsv()
and then assigning the values to a dictionary. The columns contain strings of numbers and letters. Occasionally there are cases where a cell is empty. In my opinion, the value read to that dictionary entry should be None
but instead nan
is assigned. Surely None
is more descriptive of an empty cell as it has a null value, whereas nan
just says that the value read is not a number.
我正在使用 Pandas 读取 csv 文件的两列,readcsv()
然后将值分配给字典。列包含数字和字母字符串。偶尔会出现单元格为空的情况。在我看来,读取到该字典条目的值应该是None
而是nan
被分配的。当然None
更能描述空单元格,因为它有一个空值,而nan
只是说读取的值不是数字。
Is my understanding correct, what IS the difference between None
and nan
? Why is nan
assigned instead of None
?
我的理解是否正确,None
和之间有什么区别nan
?为什么是nan
赋值而不是None
?
Also, my dictionary check for any empty cells has been using numpy.isnan()
:
此外,我的字典检查任何空单元格一直在使用numpy.isnan()
:
for k, v in my_dict.iteritems():
if np.isnan(v):
But this gives me an error saying that I cannot use this check for v
. I guess it is because an integer or float variable, not a string is meant to be used. If this is true, how can I check v
for an "empty cell"/nan
case?
但这给了我一个错误,说我不能使用这个检查v
. 我想这是因为要使用整数或浮点变量,而不是字符串。如果这是真的,我如何检查v
“空单元格”/nan
案例?
采纳答案by Andy Hayden
NaN is used as a placeholder for missing data consistentlyin pandas, consistency is good. I usually read/translate NaN as "missing". Also see the 'working with missing data'section in the docs.
NaN被用作占位符,丢失的数据一致的大熊猫,一致性好等特点。我通常阅读/翻译 NaN 为"missing"。另请参阅文档中的“处理缺失数据”部分。
Wes writes in the docs 'choice of NA-representation':
Wes 在文档“选择 NA 表示”中写道:
After years of production use [NaN] has proven, at least in my opinion, to be the best decision given the state of affairs in NumPy and Python in general. The special value NaN (Not-A-Number) is used everywhereas the NA value, and there are API functions
isnull
andnotnull
which can be used across the dtypes to detect NA values.
...
Thus, I have chosen the Pythonic “practicality beats purity” approach and traded integer NA capability for a much simpler approach of using a special value in float and object arrays to denote NA, and promoting integer arrays to floating when NAs must be introduced.
经过多年的生产使用,[NaN] 已经证明,至少在我看来,考虑到 NumPy 和 Python 的总体情况,这是最好的决定。特殊值NaN(非-A-数)被用作到处作为NA值,并有API函数
isnull
和notnull
可跨越dtypes被用于检测NA的值。
...
因此,我选择了 Pythonic 的“实用性胜过纯度”的方法,并用整数 NA 的能力换来了一种更简单的方法,即在浮点数和对象数组中使用特殊值来表示 NA,并在必须使用 NA 时将整数数组提升为浮点数介绍。
Note: the "gotcha" that integer Series containing missing data are upcast to floats.
注意:包含缺失数据的整数系列被向上转换为 floats的“问题”。
In my opinion the main reason to use NaN (over None) is that it can be stored with numpy's float64 dtype, rather than the less efficient object dtype, see NA type promotions.
在我看来,使用 NaN(而不是 None)的主要原因是它可以与 numpy 的 float64 dtype 一起存储,而不是效率较低的对象 dtype,请参阅NA 类型促销。
# without forcing dtype it changes None to NaN!
s_bad = pd.Series([1, None], dtype=object)
s_good = pd.Series([1, np.nan])
In [13]: s_bad.dtype
Out[13]: dtype('O')
In [14]: s_good.dtype
Out[14]: dtype('float64')
Jeff comments (below) on this:
杰夫对此评论(如下):
np.nan
allows for vectorized operations; its a float value, whileNone
, by definition, forces object type, which basically disables all efficiency in numpy.So repeat 3 times fast: object==bad, float==good
np.nan
允许向量化操作;它是一个浮点值,而None
根据定义,它强制对象类型,这基本上禁用了 numpy 中的所有效率。所以快速重复 3 次:object==bad,float==good
Saying that, many operations may still work just as well with None vs NaN (but perhaps are not supported i.e. they may sometimes give surprising results):
也就是说,许多操作可能仍然适用于 None 与 NaN(但可能不受支持,即它们有时可能会给出令人惊讶的结果):
In [15]: s_bad.sum()
Out[15]: 1
In [16]: s_good.sum()
Out[16]: 1.0
To answer the second question:
You should be using pd.isnull
and pd.notnull
to test for missing data (NaN).
回答第二个问题:
您应该使用pd.isnull
并pd.notnull
测试缺失数据 (NaN)。
回答by diegoaguilar
NaN
stants for NOT a number.None
might stand for any.
NaN
常量不是一个数字。None
可能代表任何。
回答by Stephan
The function isnan()
checks to see if something is "Not A Number" and will return whether or not a variable is a number, for example isnan(2)
would return false
该函数会isnan()
检查某些内容是否为“非数字”,并将返回变量是否为数字,例如isnan(2)
返回 false
The conditional myVar is not None
returns whether or not the variable is defined
条件myVar is not None
返回是否定义了变量
Your numpy array uses isnan()
because it is intended to be an array of numbers and it initializes all elements of the array to NaN
these elements are considered "empty"
您使用 numpy 数组isnan()
是因为它旨在成为一个数字数组,并且它将数组的所有元素初始化为NaN
这些元素被视为“空”
回答by heltonbiker
NaN
can be used as a numerical value on mathematical operations, while None
cannot (or at least shouldn't).
NaN
可以用作数学运算的数值,而None
不能(或至少不应该)。
NaN
is a numeric value, as defined in IEEE 754 floating-point standard.
None
is an internal Python type (NoneType
) and would be more like "inexistent" or "empty" than "numerically invalid" in this context.
NaN
是一个数值,如IEEE 754 浮点标准中所定义。
None
是一个内部 Python 类型 ( NoneType
) 并且在这种情况下更像是“不存在”或“空”而不是“数字无效”。
The main "symptom" of that is that, if you perform, say, an average or a sum on an array containing NaN, even a single one, you get NaN as a result...
其主要的“症状”是,如果您对包含 NaN 的数组(即使是单个数组)执行平均值或求和,结果会得到 NaN ......
In the other hand, you cannot perform mathematical operations using None
as operand.
另一方面,您不能使用None
as 操作数执行数学运算。
So, depending on the case, you could use None
as a way to tell your algorithm not to consider invalid or inexistent values on computations. That would mean the algorithm should test each value to see if it is None
.
因此,根据情况,您可以使用None
一种方法来告诉您的算法不要在计算中考虑无效或不存在的值。这意味着算法应该测试每个值以查看它是否为None
。
Numpy has some functions to avoid NaN values to contaminate your results, such as nansum
and nan_to_num
for example.
Numpy 有一些功能可以避免 NaN 值污染您的结果,例如nansum
和nan_to_num
例如。
回答by eswara amirthan s
Below are the differences:
以下是差异:
nan
belongs to the classfloat
None
belongs to the classNoneType
nan
属于班级float
None
属于班级NoneType
I found the below article very helpful: https://medium.com/analytics-vidhya/dealing-with-missing-values-nan-and-none-in-python-6fc9b8fb4f31
我发现以下文章非常有帮助:https: //medium.com/analytics-vidhya/dealing-with-missing-values-nan-and-none-in-python-6fc9b8fb4f31