Python 如何检查 Pandas DataFrame 中是否有任何值是 NaN

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29530232/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:41:51  来源:igfitidea点击:

How to check if any value is NaN in a Pandas DataFrame

pythonpandasdataframenan

提问by hlin117

In Python Pandas, what's the best way to check whether a DataFrame has one (or more) NaN values?

在 Python Pandas 中,检查 DataFrame 是否具有一个(或多个)NaN 值的最佳方法是什么?

I know about the function pd.isnan, but this returns a DataFrame of booleans for each element. This postright here doesn't exactly answer my question either.

我知道这个函数pd.isnan,但这会为每个元素返回一个布尔值的 DataFrame。这篇文章也没有完全回答我的问题。

采纳答案by S Anand

jwilner's response is spot on. I was exploring to see if there's a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster:

jwilner的回应恰到好处。我正在探索是否有更快的选择,因为根据我的经验,对平面数组求和(奇怪地)比计数快。这段代码看起来更快:

df.isnull().values.any()

For example:

例如:

In [2]: df = pd.DataFrame(np.random.randn(1000,1000))

In [3]: df[df > 0.9] = pd.np.nan

In [4]: %timeit df.isnull().any().any()
100 loops, best of 3: 14.7 ms per loop

In [5]: %timeit df.isnull().values.sum()
100 loops, best of 3: 2.15 ms per loop

In [6]: %timeit df.isnull().sum().sum()
100 loops, best of 3: 18 ms per loop

In [7]: %timeit df.isnull().values.any()
1000 loops, best of 3: 948 μs per loop

df.isnull().sum().sum()is a bit slower, but of course, has additional information -- the number of NaNs.

df.isnull().sum().sum()有点慢,但当然还有额外的信息 - 的数量NaNs

回答by jwilner

df.isnull().any().any()should do it.

df.isnull().any().any()应该这样做。

回答by Andy

You have a couple of options.

你有几个选择。

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,6))
# Make a few areas have NaN values
df.iloc[1:3,1] = np.nan
df.iloc[5,3] = np.nan
df.iloc[7:9,5] = np.nan

Now the data frame looks something like this:

现在数据框看起来像这样:

          0         1         2         3         4         5
0  0.520113  0.884000  1.260966 -0.236597  0.312972 -0.196281
1 -0.837552       NaN  0.143017  0.862355  0.346550  0.842952
2 -0.452595       NaN -0.420790  0.456215  1.203459  0.527425
3  0.317503 -0.917042  1.780938 -1.584102  0.432745  0.389797
4 -0.722852  1.704820 -0.113821 -1.466458  0.083002  0.011722
5 -0.622851 -0.251935 -1.498837       NaN  1.098323  0.273814
6  0.329585  0.075312 -0.690209 -3.807924  0.489317 -0.841368
7 -1.123433 -1.187496  1.868894 -2.046456 -0.949718       NaN
8  1.133880 -0.110447  0.050385 -1.158387  0.188222       NaN
9 -0.513741  1.196259  0.704537  0.982395 -0.585040 -1.693810
  • Option 1: df.isnull().any().any()- This returns a boolean value
  • 选项 1df.isnull().any().any()- 这将返回一个布尔值

You know of the isnull()which would return a dataframe like this:

您知道isnull()哪个会返回这样的数据帧:

       0      1      2      3      4      5
0  False  False  False  False  False  False
1  False   True  False  False  False  False
2  False   True  False  False  False  False
3  False  False  False  False  False  False
4  False  False  False  False  False  False
5  False  False  False   True  False  False
6  False  False  False  False  False  False
7  False  False  False  False  False   True
8  False  False  False  False  False   True
9  False  False  False  False  False  False

If you make it df.isnull().any(), you can find just the columns that have NaNvalues:

如果你成功了df.isnull().any(),你只能找到有NaN值的列:

0    False
1     True
2    False
3     True
4    False
5     True
dtype: bool

One more .any()will tell you if any of the above are True

还有一个.any()会告诉你上面是否有任何一个True

> df.isnull().any().any()
True
  • Option 2: df.isnull().sum().sum()- This returns an integer of the total number of NaNvalues:
  • 选项 2df.isnull().sum().sum()- 这将返回NaN值总数的整数:

This operates the same way as the .any().any()does, by first giving a summation of the number of NaNvalues in a column, then the summation of those values:

这与 的操作方式相同.any().any(),首先给出NaN列中值数量的总和,然后是这些值的总和:

df.isnull().sum()
0    0
1    2
2    0
3    1
4    0
5    2
dtype: int64

Finally, to get the total number of NaN values in the DataFrame:

最后,要获取 DataFrame 中 NaN 值的总数:

df.isnull().sum().sum()
5

回答by andrewwowens

Depending on the type of data you're dealing with, you could also just get the value counts of each column while performing your EDA by setting dropna to False.

根据您处理的数据类型,您还可以在执行 EDA 时通过将 dropna 设置为 False 来获取每列的值计数。

for col in df:
   print df[col].value_counts(dropna=False)

Works well for categorical variables, not so much when you have many unique values.

适用于分类变量,当您有许多唯一值时就不太适用了。

回答by hobs

If you need to know how many rows there are with "one or more NaNs":

如果您需要知道“一个或多个NaNs”有多少行:

df.isnull().T.any().T.sum()

Or if you need to pull out these rows and examine them:

或者,如果您需要拉出这些行并检查它们:

nan_rows = df[df.isnull().T.any().T]

回答by Marshall Farrier

Since pandashas to find this out for DataFrame.dropna(), I took a look to see how they implement it and discovered that they made use of DataFrame.count(), which counts all non-null values in the DataFrame. Cf. pandas source code. I haven't benchmarked this technique, but I figure the authors of the library are likely to have made a wise choice for how to do it.

由于pandas必须发现这一点的DataFrame.dropna(),我接过来一看,看看他们是如何实现它,并发现他们利用的DataFrame.count(),其计算在所有非空值DataFrame。参见 熊猫源代码。我还没有对这项技术进行基准测试,但我认为该库的作者可能已经就如何做到这一点做出了明智的选择。

回答by yazhi

Since none have mentioned, there is just another variable called hasnans.

由于没有提到,只有另一个变量称为hasnans.

df[i].hasnanswill output to Trueif one or more of the values in the pandas Series is NaN, Falseif not. Note that its not a function.

df[i].hasnansTrue如果熊猫系列中的一个或多个值是 NaN,False则输出到NaN,如果不是。请注意,它不是一个函数。

pandas version '0.19.2' and '0.20.2'

熊猫版本“0.19.2”和“0.20.2”

回答by Ankit

Adding to Hobs brilliant answer, I am very new to Python and Pandas so please point out if I am wrong.

除了 Hobs 出色的答案之外,我对 Python 和 Pandas 还很陌生,所以如果我错了,请指出。

To find out which rows have NaNs:

要找出哪些行具有 NaN:

nan_rows = df[df.isnull().any(1)]

would perform the same operation without the need for transposing by specifying the axis of any() as 1 to check if 'True' is present in rows.

通过将 any() 的轴指定为 1 来检查行中是否存在“真”,将执行相同的操作而无需转置。

回答by u5985526

Just using math.isnan(x), Return True if x is a NaN (not a number), and False otherwise.

仅使用 math.isnan(x),如果 x 是 NaN(不是数字),则返回 True,否则返回 False。

回答by Ihor Ivasiuk

To find out which rows have NaNs in a specific column:

要找出特定列中哪些行具有 NaN:

nan_rows = df[df['name column'].isnull()]