Python 如何在 Pandas 数据框中查找哪些列包含任何 NaN 值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36226083/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to find which columns contain any NaN value in Pandas dataframe
提问by denvar
Given a pandas dataframe containing possible NaN values scattered here and there:
给定一个包含散布在各处的可能 NaN 值的 Pandas 数据框:
Question:How do I determine which columns contain NaN values? In particular, can I get a list of the column names containing NaNs?
问题:如何确定哪些列包含 NaN 值?特别是,我可以获得包含 NaN 的列名列表吗?
回答by MaxU
UPDATE:using Pandas 0.22.0
更新:使用 Pandas 0.22.0
Newer Pandas versions have new methods 'DataFrame.isna()'and 'DataFrame.notna()'
较新的 Pandas 版本具有新方法“DataFrame.isna()”和“DataFrame.notna()”
In [71]: df
Out[71]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1
In [72]: df.isna().any()
Out[72]:
a True
b True
c False
dtype: bool
as list of columns:
作为列列表:
In [74]: df.columns[df.isna().any()].tolist()
Out[74]: ['a', 'b']
to select those columns (containing at least one NaN
value):
选择那些列(至少包含一个NaN
值):
In [73]: df.loc[:, df.isna().any()]
Out[73]:
a b
0 NaN 7.0
1 0.0 NaN
2 2.0 NaN
3 1.0 7.0
4 1.0 3.0
5 7.0 4.0
6 2.0 6.0
7 9.0 6.0
8 3.0 0.0
9 9.0 0.0
OLD answer:
旧答案:
Try to use isnull():
尝试使用isnull():
In [97]: df
Out[97]:
a b c
0 NaN 7.0 0
1 0.0 NaN 4
2 2.0 NaN 4
3 1.0 7.0 0
4 1.0 3.0 9
5 7.0 4.0 9
6 2.0 6.0 9
7 9.0 6.0 4
8 3.0 0.0 9
9 9.0 0.0 1
In [98]: pd.isnull(df).sum() > 0
Out[98]:
a True
b True
c False
dtype: bool
or as @root proposed clearer version:
或作为@root 提出的更清晰的版本:
In [5]: df.isnull().any()
Out[5]:
a True
b True
c False
dtype: bool
In [7]: df.columns[df.isnull().any()].tolist()
Out[7]: ['a', 'b']
to select a subset - all columns containing at least one NaN
value:
选择一个子集 - 所有包含至少一个NaN
值的列:
In [31]: df.loc[:, df.isnull().any()]
Out[31]:
a b
0 NaN 7.0
1 0.0 NaN
2 2.0 NaN
3 1.0 7.0
4 1.0 3.0
5 7.0 4.0
6 2.0 6.0
7 9.0 6.0
8 3.0 0.0
9 9.0 0.0
回答by Matheus
You can use df.isnull().sum()
. It shows all columns and the total NaNs of each feature.
您可以使用df.isnull().sum()
. 它显示了每个特征的所有列和总 NaN。
回答by Tom Wattley
I had a problem where I had to many columns to visually inspect on the screen so a short list comp that filters and returns the offending columns is
我遇到了一个问题,我必须在屏幕上对许多列进行视觉检查,因此筛选并返回违规列的短列表组合是
nan_cols = [i for i in df.columns if df[i].isnull().any()]
if that's helpful to anyone
如果这对任何人有帮助
回答by Pradeep Singh
In datasets having large number of columns its even better to see how many columns contain null values and how many don't.
在具有大量列的数据集中,最好查看有多少列包含空值,有多少不包含空值。
print("No. of columns containing null values")
print(len(df.columns[df.isna().any()]))
print("No. of columns not containing null values")
print(len(df.columns[df.notna().all()]))
print("Total no. of columns in the dataframe")
print(len(df.columns))
For example in my dataframe it contained 82 columns, of which 19 contained at least one null value.
例如,在我的数据框中,它包含 82 列,其中 19 列包含至少一个空值。
Further you can also automatically remove cols and rowsdepending on which has more null values
Here is the code which does this intelligently:
此外,您还可以根据哪些具有更多空值自动删除列和行
这是智能执行此操作的代码:
df = df.drop(df.columns[df.isna().sum()>len(df.columns)],axis = 1)
df = df.dropna(axis = 0).reset_index(drop=True)
Note:Above code removes all of your null values. If you want null values, process them before.
注意:上面的代码删除了所有的空值。如果您想要空值,请先处理它们。
回答by Frank
i use these three lines of code to print out the column names which contain at least one null value:
我使用这三行代码打印出包含至少一个空值的列名:
for column in dataframe:
if dataframe[column].isnull().any():
print('{0} has {1} null values'.format(column, dataframe[column].isnull().sum()))
回答by prosti
Both of these should work:
这两个都应该有效:
df.isnull().sum()
df.isna().sum()
DataFrame methods isna()
or isnull()
are completely identical.
DataFrame 方法isna()
或isnull()
完全相同。
Note: Empty strings ''
is considered as False (not considered NA)
注意:空字符串''
被视为 False(不考虑 NA)