有效地检查 Python/numpy/pandas 中的任意对象是否为 NaN?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18689512/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Efficiently checking if arbitrary object is NaN in Python / numpy / pandas?
提问by Dun Peal
My numpy arrays use np.nan
to designate missing values. As I iterate over the data set, I need to detect such missing values and handle them in special ways.
我的 numpy 数组用于np.nan
指定缺失值。当我遍历数据集时,我需要检测此类缺失值并以特殊方式处理它们。
Naively I used numpy.isnan(val)
, which works well unless val
isn't among the subset of types supported by numpy.isnan()
. For example, missing data can occur in string fields, in which case I get:
我天真地使用了numpy.isnan(val)
,除非val
不在numpy.isnan()
. 例如,缺失数据可能出现在字符串字段中,在这种情况下,我得到:
>>> np.isnan('some_string')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Not implemented for this type
Other than writing an expensive wrapper that catches the exception and returns False
, is there a way to handle this elegantly and efficiently?
除了编写一个昂贵的包装器来捕获异常并返回之外False
,有没有办法优雅有效地处理这个问题?
采纳答案by Marius
pandas.isnull()
(also pd.isna()
, in newer versions) checks for missing values in both numeric and string/object arrays. From the documentation, it checks for:
pandas.isnull()
(也在pd.isna()
较新版本中)检查数字和字符串/对象数组中的缺失值。从文档中,它检查:
NaN in numeric arrays, None/NaN in object arrays
数值数组中的 NaN,对象数组中的 None/NaN
Quick example:
快速示例:
import pandas as pd
import numpy as np
s = pd.Series(['apple', np.nan, 'banana'])
pd.isnull(s)
Out[9]:
0 False
1 True
2 False
dtype: bool
The idea of using numpy.nan
to represent missing values is something that pandas
introduced, which is why pandas
has the tools to deal with it.
使用numpy.nan
来表示缺失值的想法是pandas
引入的,这就是为什么pandas
有工具来处理它。
Datetimes too (if you use pd.NaT
you won't need to specify the dtype)
日期时间也是(如果您使用pd.NaT
,则不需要指定 dtype)
In [24]: s = Series([Timestamp('20130101'),np.nan,Timestamp('20130102 9:30')],dtype='M8[ns]')
In [25]: s
Out[25]:
0 2013-01-01 00:00:00
1 NaT
2 2013-01-02 09:30:00
dtype: datetime64[ns]``
In [26]: pd.isnull(s)
Out[26]:
0 False
1 True
2 False
dtype: bool
回答by Hammer
Is your type really arbitrary? If you know it is just going to be a int float or string you could just do
你的类型真的很随意吗?如果你知道它只是一个 int 浮点数或字符串,你可以做
if val.dtype == float and np.isnan(val):
assuming it is wrapped in numpy , it will always have a dtype and only float and complex can be NaN
假设它被包裹在 numpy 中,它总是有一个 dtype 并且只有 float 和 complex 可以是 NaN