有效地检查 Python/numpy/pandas 中的任意对象是否为 NaN?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18689512/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:29:02  来源:igfitidea点击:

Efficiently checking if arbitrary object is NaN in Python / numpy / pandas?

pythonnumpypandas

提问by Dun Peal

My numpy arrays use np.nanto designate missing values. As I iterate over the data set, I need to detect such missing values and handle them in special ways.

我的 numpy 数组用于np.nan指定缺失值。当我遍历数据集时,我需要检测此类缺失值并以特殊方式处理它们。

Naively I used numpy.isnan(val), which works well unless valisn't among the subset of types supported by numpy.isnan(). For example, missing data can occur in string fields, in which case I get:

我天真地使用了numpy.isnan(val),除非val不在numpy.isnan(). 例如,缺失数据可能出现在字符串字段中,在这种情况下,我得到:

>>> np.isnan('some_string')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Not implemented for this type

Other than writing an expensive wrapper that catches the exception and returns False, is there a way to handle this elegantly and efficiently?

除了编写一个昂贵的包装器来捕获异常并返回之外False,有没有办法优雅有效地处理这个问题?

采纳答案by Marius

pandas.isnull()(also pd.isna(), in newer versions) checks for missing values in both numeric and string/object arrays. From the documentation, it checks for:

pandas.isnull()(也在pd.isna()较新版本中)检查数字和字符串/对象数组中的缺失值。从文档中,它检查:

NaN in numeric arrays, None/NaN in object arrays

数值数组中的 NaN,对象数组中的 None/NaN

Quick example:

快速示例:

import pandas as pd
import numpy as np
s = pd.Series(['apple', np.nan, 'banana'])
pd.isnull(s)
Out[9]: 
0    False
1     True
2    False
dtype: bool

The idea of using numpy.nanto represent missing values is something that pandasintroduced, which is why pandashas the tools to deal with it.

使用numpy.nan来表示缺失值的想法是pandas引入的,这就是为什么pandas有工具来处理它。

Datetimes too (if you use pd.NaTyou won't need to specify the dtype)

日期时间也是(如果您使用pd.NaT,则不需要指定 dtype)

In [24]: s = Series([Timestamp('20130101'),np.nan,Timestamp('20130102 9:30')],dtype='M8[ns]')

In [25]: s
Out[25]: 
0   2013-01-01 00:00:00
1                   NaT
2   2013-01-02 09:30:00
dtype: datetime64[ns]``

In [26]: pd.isnull(s)
Out[26]: 
0    False
1     True
2    False
dtype: bool

回答by Hammer

Is your type really arbitrary? If you know it is just going to be a int float or string you could just do

你的类型真的很随意吗?如果你知道它只是一个 int 浮点数或字符串,你可以做

 if val.dtype == float and np.isnan(val):

assuming it is wrapped in numpy , it will always have a dtype and only float and complex can be NaN

假设它被包裹在 numpy 中,它总是有一个 dtype 并且只有 float 和 complex 可以是 NaN