有效地检查 Python/numpy/pandas 中的任意对象是否为 NaN?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18689512/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Efficiently checking if arbitrary object is NaN in Python / numpy / pandas?
提问by Dun Peal
My numpy arrays use np.nanto designate missing values. As I iterate over the data set, I need to detect such missing values and handle them in special ways.
我的 numpy 数组用于np.nan指定缺失值。当我遍历数据集时,我需要检测此类缺失值并以特殊方式处理它们。
Naively I used numpy.isnan(val), which works well unless valisn't among the subset of types supported by numpy.isnan(). For example, missing data can occur in string fields, in which case I get:
我天真地使用了numpy.isnan(val),除非val不在numpy.isnan(). 例如,缺失数据可能出现在字符串字段中,在这种情况下,我得到:
>>> np.isnan('some_string')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Not implemented for this type
Other than writing an expensive wrapper that catches the exception and returns False, is there a way to handle this elegantly and efficiently?
除了编写一个昂贵的包装器来捕获异常并返回之外False,有没有办法优雅有效地处理这个问题?
采纳答案by Marius
pandas.isnull()(also pd.isna(), in newer versions) checks for missing values in both numeric and string/object arrays. From the documentation, it checks for:
pandas.isnull()(也在pd.isna()较新版本中)检查数字和字符串/对象数组中的缺失值。从文档中,它检查:
NaN in numeric arrays, None/NaN in object arrays
数值数组中的 NaN,对象数组中的 None/NaN
Quick example:
快速示例:
import pandas as pd
import numpy as np
s = pd.Series(['apple', np.nan, 'banana'])
pd.isnull(s)
Out[9]:
0 False
1 True
2 False
dtype: bool
The idea of using numpy.nanto represent missing values is something that pandasintroduced, which is why pandashas the tools to deal with it.
使用numpy.nan来表示缺失值的想法是pandas引入的,这就是为什么pandas有工具来处理它。
Datetimes too (if you use pd.NaTyou won't need to specify the dtype)
日期时间也是(如果您使用pd.NaT,则不需要指定 dtype)
In [24]: s = Series([Timestamp('20130101'),np.nan,Timestamp('20130102 9:30')],dtype='M8[ns]')
In [25]: s
Out[25]:
0 2013-01-01 00:00:00
1 NaT
2 2013-01-02 09:30:00
dtype: datetime64[ns]``
In [26]: pd.isnull(s)
Out[26]:
0 False
1 True
2 False
dtype: bool
回答by Hammer
Is your type really arbitrary? If you know it is just going to be a int float or string you could just do
你的类型真的很随意吗?如果你知道它只是一个 int 浮点数或字符串,你可以做
if val.dtype == float and np.isnan(val):
assuming it is wrapped in numpy , it will always have a dtype and only float and complex can be NaN
假设它被包裹在 numpy 中,它总是有一个 dtype 并且只有 float 和 complex 可以是 NaN

