Python 如何在 Pandas 数据框中查找哪些列包含任何 NaN 值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36226083/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:35:32  来源:igfitidea点击:

How to find which columns contain any NaN value in Pandas dataframe

pythonpandasdataframenan

提问by denvar

Given a pandas dataframe containing possible NaN values scattered here and there:

给定一个包含散布在各处的可能 NaN 值的 Pandas 数据框:

Question:How do I determine which columns contain NaN values? In particular, can I get a list of the column names containing NaNs?

问题:如何确定哪些列包含 NaN 值?特别是,我可以获得包含 NaN 的列名列表吗?

回答by MaxU

UPDATE:using Pandas 0.22.0

更新:使用 Pandas 0.22.0

Newer Pandas versions have new methods 'DataFrame.isna()'and 'DataFrame.notna()'

较新的 Pandas 版本具有新方法“DataFrame.isna()”“DataFrame.notna()”

In [71]: df
Out[71]:
     a    b  c
0  NaN  7.0  0
1  0.0  NaN  4
2  2.0  NaN  4
3  1.0  7.0  0
4  1.0  3.0  9
5  7.0  4.0  9
6  2.0  6.0  9
7  9.0  6.0  4
8  3.0  0.0  9
9  9.0  0.0  1

In [72]: df.isna().any()
Out[72]:
a     True
b     True
c    False
dtype: bool

as list of columns:

作为列列表:

In [74]: df.columns[df.isna().any()].tolist()
Out[74]: ['a', 'b']

to select those columns (containing at least one NaNvalue):

选择那些列(至少包含一个NaN值):

In [73]: df.loc[:, df.isna().any()]
Out[73]:
     a    b
0  NaN  7.0
1  0.0  NaN
2  2.0  NaN
3  1.0  7.0
4  1.0  3.0
5  7.0  4.0
6  2.0  6.0
7  9.0  6.0
8  3.0  0.0
9  9.0  0.0


OLD answer:

旧答案:

Try to use isnull():

尝试使用isnull()

In [97]: df
Out[97]:
     a    b  c
0  NaN  7.0  0
1  0.0  NaN  4
2  2.0  NaN  4
3  1.0  7.0  0
4  1.0  3.0  9
5  7.0  4.0  9
6  2.0  6.0  9
7  9.0  6.0  4
8  3.0  0.0  9
9  9.0  0.0  1

In [98]: pd.isnull(df).sum() > 0
Out[98]:
a     True
b     True
c    False
dtype: bool

or as @root proposed clearer version:

或作为@root 提出的更清晰的版本:

In [5]: df.isnull().any()
Out[5]:
a     True
b     True
c    False
dtype: bool

In [7]: df.columns[df.isnull().any()].tolist()
Out[7]: ['a', 'b']

to select a subset - all columns containing at least one NaNvalue:

选择一个子集 - 所有包含至少一个NaN值的列:

In [31]: df.loc[:, df.isnull().any()]
Out[31]:
     a    b
0  NaN  7.0
1  0.0  NaN
2  2.0  NaN
3  1.0  7.0
4  1.0  3.0
5  7.0  4.0
6  2.0  6.0
7  9.0  6.0
8  3.0  0.0
9  9.0  0.0

回答by Matheus

You can use df.isnull().sum(). It shows all columns and the total NaNs of each feature.

您可以使用df.isnull().sum(). 它显示了每个特征的所有列和总 NaN。

回答by Tom Wattley

I had a problem where I had to many columns to visually inspect on the screen so a short list comp that filters and returns the offending columns is

我遇到了一个问题,我必须在屏幕上对许多列进行视觉检查,因此筛选并返回违规列的短列表组合是

nan_cols = [i for i in df.columns if df[i].isnull().any()]

if that's helpful to anyone

如果这对任何人有帮助

回答by Pradeep Singh

In datasets having large number of columns its even better to see how many columns contain null values and how many don't.

在具有大量列的数据集中,最好查看有多少列包含空值,有多少不包含空值。

print("No. of columns containing null values")
print(len(df.columns[df.isna().any()]))

print("No. of columns not containing null values")
print(len(df.columns[df.notna().all()]))

print("Total no. of columns in the dataframe")
print(len(df.columns))

For example in my dataframe it contained 82 columns, of which 19 contained at least one null value.

例如,在我的数据框中,它包含 82 列,其中 19 列包含至少一个空值。

Further you can also automatically remove cols and rowsdepending on which has more null values
Here is the code which does this intelligently:

此外,您还可以根据哪些具有更多空值自动删除列和行
这是智能执行此操作的代码:

df = df.drop(df.columns[df.isna().sum()>len(df.columns)],axis = 1)
df = df.dropna(axis = 0).reset_index(drop=True)

Note:Above code removes all of your null values. If you want null values, process them before.

注意:上面的代码删除了所有的空值。如果您想要空值,请先处理它们。

回答by Frank

i use these three lines of code to print out the column names which contain at least one null value:

我使用这三行代码打印出包含至少一个空值的列名:

for column in dataframe:
    if dataframe[column].isnull().any():
       print('{0} has {1} null values'.format(column, dataframe[column].isnull().sum()))

回答by prosti

Both of these should work:

这两个都应该有效:

df.isnull().sum()
df.isna().sum()

DataFrame methods isna()or isnull()are completely identical.

DataFrame 方法isna()isnull()完全相同。

Note: Empty strings ''is considered as False (not considered NA)

注意:空字符串''被视为 False(不考虑 NA)