Python 如何检查 Pandas DataFrame 中是否有任何值是 NaN
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29530232/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to check if any value is NaN in a Pandas DataFrame
提问by hlin117
In Python Pandas, what's the best way to check whether a DataFrame has one (or more) NaN values?
在 Python Pandas 中,检查 DataFrame 是否具有一个(或多个)NaN 值的最佳方法是什么?
I know about the function pd.isnan
, but this returns a DataFrame of booleans for each element. This postright here doesn't exactly answer my question either.
我知道这个函数pd.isnan
,但这会为每个元素返回一个布尔值的 DataFrame。这篇文章也没有完全回答我的问题。
采纳答案by S Anand
jwilner's response is spot on. I was exploring to see if there's a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster:
jwilner的回应恰到好处。我正在探索是否有更快的选择,因为根据我的经验,对平面数组求和(奇怪地)比计数快。这段代码看起来更快:
df.isnull().values.any()
For example:
例如:
In [2]: df = pd.DataFrame(np.random.randn(1000,1000))
In [3]: df[df > 0.9] = pd.np.nan
In [4]: %timeit df.isnull().any().any()
100 loops, best of 3: 14.7 ms per loop
In [5]: %timeit df.isnull().values.sum()
100 loops, best of 3: 2.15 ms per loop
In [6]: %timeit df.isnull().sum().sum()
100 loops, best of 3: 18 ms per loop
In [7]: %timeit df.isnull().values.any()
1000 loops, best of 3: 948 μs per loop
df.isnull().sum().sum()
is a bit slower, but of course, has additional information -- the number of NaNs
.
df.isnull().sum().sum()
有点慢,但当然还有额外的信息 - 的数量NaNs
。
回答by jwilner
df.isnull().any().any()
should do it.
df.isnull().any().any()
应该这样做。
回答by Andy
You have a couple of options.
你有几个选择。
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10,6))
# Make a few areas have NaN values
df.iloc[1:3,1] = np.nan
df.iloc[5,3] = np.nan
df.iloc[7:9,5] = np.nan
Now the data frame looks something like this:
现在数据框看起来像这样:
0 1 2 3 4 5
0 0.520113 0.884000 1.260966 -0.236597 0.312972 -0.196281
1 -0.837552 NaN 0.143017 0.862355 0.346550 0.842952
2 -0.452595 NaN -0.420790 0.456215 1.203459 0.527425
3 0.317503 -0.917042 1.780938 -1.584102 0.432745 0.389797
4 -0.722852 1.704820 -0.113821 -1.466458 0.083002 0.011722
5 -0.622851 -0.251935 -1.498837 NaN 1.098323 0.273814
6 0.329585 0.075312 -0.690209 -3.807924 0.489317 -0.841368
7 -1.123433 -1.187496 1.868894 -2.046456 -0.949718 NaN
8 1.133880 -0.110447 0.050385 -1.158387 0.188222 NaN
9 -0.513741 1.196259 0.704537 0.982395 -0.585040 -1.693810
- Option 1:
df.isnull().any().any()
- This returns a boolean value
- 选项 1:
df.isnull().any().any()
- 这将返回一个布尔值
You know of the isnull()
which would return a dataframe like this:
您知道isnull()
哪个会返回这样的数据帧:
0 1 2 3 4 5
0 False False False False False False
1 False True False False False False
2 False True False False False False
3 False False False False False False
4 False False False False False False
5 False False False True False False
6 False False False False False False
7 False False False False False True
8 False False False False False True
9 False False False False False False
If you make it df.isnull().any()
, you can find just the columns that have NaN
values:
如果你成功了df.isnull().any()
,你只能找到有NaN
值的列:
0 False
1 True
2 False
3 True
4 False
5 True
dtype: bool
One more .any()
will tell you if any of the above are True
还有一个.any()
会告诉你上面是否有任何一个True
> df.isnull().any().any()
True
- Option 2:
df.isnull().sum().sum()
- This returns an integer of the total number ofNaN
values:
- 选项 2:
df.isnull().sum().sum()
- 这将返回NaN
值总数的整数:
This operates the same way as the .any().any()
does, by first giving a summation of the number of NaN
values in a column, then the summation of those values:
这与 的操作方式相同.any().any()
,首先给出NaN
列中值数量的总和,然后是这些值的总和:
df.isnull().sum()
0 0
1 2
2 0
3 1
4 0
5 2
dtype: int64
Finally, to get the total number of NaN values in the DataFrame:
最后,要获取 DataFrame 中 NaN 值的总数:
df.isnull().sum().sum()
5
回答by andrewwowens
Depending on the type of data you're dealing with, you could also just get the value counts of each column while performing your EDA by setting dropna to False.
根据您处理的数据类型,您还可以在执行 EDA 时通过将 dropna 设置为 False 来获取每列的值计数。
for col in df:
print df[col].value_counts(dropna=False)
Works well for categorical variables, not so much when you have many unique values.
适用于分类变量,当您有许多唯一值时就不太适用了。
回答by hobs
If you need to know how many rows there are with "one or more NaN
s":
如果您需要知道“一个或多个NaN
s”有多少行:
df.isnull().T.any().T.sum()
Or if you need to pull out these rows and examine them:
或者,如果您需要拉出这些行并检查它们:
nan_rows = df[df.isnull().T.any().T]
回答by Marshall Farrier
Since pandas
has to find this out for DataFrame.dropna()
, I took a look to see how they implement it and discovered that they made use of DataFrame.count()
, which counts all non-null values in the DataFrame
. Cf. pandas source code. I haven't benchmarked this technique, but I figure the authors of the library are likely to have made a wise choice for how to do it.
由于pandas
必须发现这一点的DataFrame.dropna()
,我接过来一看,看看他们是如何实现它,并发现他们利用的DataFrame.count()
,其计算在所有非空值DataFrame
。参见 熊猫源代码。我还没有对这项技术进行基准测试,但我认为该库的作者可能已经就如何做到这一点做出了明智的选择。
回答by yazhi
Since none have mentioned, there is just another variable called hasnans
.
由于没有提到,只有另一个变量称为hasnans
.
df[i].hasnans
will output to True
if one or more of the values in the pandas Series is NaN, False
if not. Note that its not a function.
df[i].hasnans
True
如果熊猫系列中的一个或多个值是 NaN,False
则输出到NaN,如果不是。请注意,它不是一个函数。
pandas version '0.19.2' and '0.20.2'
熊猫版本“0.19.2”和“0.20.2”
回答by Ankit
Adding to Hobs brilliant answer, I am very new to Python and Pandas so please point out if I am wrong.
除了 Hobs 出色的答案之外,我对 Python 和 Pandas 还很陌生,所以如果我错了,请指出。
To find out which rows have NaNs:
要找出哪些行具有 NaN:
nan_rows = df[df.isnull().any(1)]
would perform the same operation without the need for transposing by specifying the axis of any() as 1 to check if 'True' is present in rows.
通过将 any() 的轴指定为 1 来检查行中是否存在“真”,将执行相同的操作而无需转置。
回答by u5985526
Just using math.isnan(x), Return True if x is a NaN (not a number), and False otherwise.
仅使用 math.isnan(x),如果 x 是 NaN(不是数字),则返回 True,否则返回 False。
回答by Ihor Ivasiuk
To find out which rows have NaNs in a specific column:
要找出特定列中哪些行具有 NaN:
nan_rows = df[df['name column'].isnull()]