Python 如何在 Pandas 数据框中查找哪些列包含任何 NaN 值

Question

提问by denvar

Given a pandas dataframe containing possible NaN values scattered here and there:

给定一个包含散布在各处的可能 NaN 值的 Pandas 数据框：

Question:How do I determine which columns contain NaN values? In particular, can I get a list of the column names containing NaNs?

问题：如何确定哪些列包含 NaN 值？特别是，我可以获得包含 NaN 的列名列表吗？

Answer 1

回答by MaxU

UPDATE:using Pandas 0.22.0

更新：使用 Pandas 0.22.0

Newer Pandas versions have new methods 'DataFrame.isna()'and 'DataFrame.notna()'

较新的 Pandas 版本具有新方法“DataFrame.isna()”和“DataFrame.notna()”

In [71]: df
Out[71]:
     a    b  c
0  NaN  7.0  0
1  0.0  NaN  4
2  2.0  NaN  4
3  1.0  7.0  0
4  1.0  3.0  9
5  7.0  4.0  9
6  2.0  6.0  9
7  9.0  6.0  4
8  3.0  0.0  9
9  9.0  0.0  1

In [72]: df.isna().any()
Out[72]:
a     True
b     True
c    False
dtype: bool

as list of columns:

作为列列表：

In [74]: df.columns[df.isna().any()].tolist()
Out[74]: ['a', 'b']

to select those columns (containing at least one NaNvalue):

选择那些列（至少包含一个NaN值）：

In [73]: df.loc[:, df.isna().any()]
Out[73]:
     a    b
0  NaN  7.0
1  0.0  NaN
2  2.0  NaN
3  1.0  7.0
4  1.0  3.0
5  7.0  4.0
6  2.0  6.0
7  9.0  6.0
8  3.0  0.0
9  9.0  0.0

OLD answer:

旧答案：

Try to use isnull():

尝试使用isnull()：

In [97]: df
Out[97]:
     a    b  c
0  NaN  7.0  0
1  0.0  NaN  4
2  2.0  NaN  4
3  1.0  7.0  0
4  1.0  3.0  9
5  7.0  4.0  9
6  2.0  6.0  9
7  9.0  6.0  4
8  3.0  0.0  9
9  9.0  0.0  1

In [98]: pd.isnull(df).sum() > 0
Out[98]:
a     True
b     True
c    False
dtype: bool

or as @root proposed clearer version:

或作为@root 提出的更清晰的版本：

In [5]: df.isnull().any()
Out[5]:
a     True
b     True
c    False
dtype: bool

In [7]: df.columns[df.isnull().any()].tolist()
Out[7]: ['a', 'b']

to select a subset - all columns containing at least one NaNvalue:

选择一个子集 - 所有包含至少一个NaN值的列：

In [31]: df.loc[:, df.isnull().any()]
Out[31]:
     a    b
0  NaN  7.0
1  0.0  NaN
2  2.0  NaN
3  1.0  7.0
4  1.0  3.0
5  7.0  4.0
6  2.0  6.0
7  9.0  6.0
8  3.0  0.0
9  9.0  0.0

Answer 2

回答by Matheus

You can use df.isnull().sum(). It shows all columns and the total NaNs of each feature.

您可以使用df.isnull().sum(). 它显示了每个特征的所有列和总 NaN。

Answer 3

回答by Tom Wattley

I had a problem where I had to many columns to visually inspect on the screen so a short list comp that filters and returns the offending columns is

我遇到了一个问题，我必须在屏幕上对许多列进行视觉检查，因此筛选并返回违规列的短列表组合是

nan_cols = [i for i in df.columns if df[i].isnull().any()]

if that's helpful to anyone

如果这对任何人有帮助

Answer 4

回答by Pradeep Singh

In datasets having large number of columns its even better to see how many columns contain null values and how many don't.

在具有大量列的数据集中，最好查看有多少列包含空值，有多少不包含空值。

print("No. of columns containing null values")
print(len(df.columns[df.isna().any()]))

print("No. of columns not containing null values")
print(len(df.columns[df.notna().all()]))

print("Total no. of columns in the dataframe")
print(len(df.columns))

For example in my dataframe it contained 82 columns, of which 19 contained at least one null value.

例如，在我的数据框中，它包含 82 列，其中 19 列包含至少一个空值。

Further you can also automatically remove cols and rowsdepending on which has more null values
Here is the code which does this intelligently:

此外，您还可以根据哪些具有更多空值自动删除列和行
这是智能执行此操作的代码：

df = df.drop(df.columns[df.isna().sum()>len(df.columns)],axis = 1)
df = df.dropna(axis = 0).reset_index(drop=True)

Note:Above code removes all of your null values. If you want null values, process them before.

注意：上面的代码删除了所有的空值。如果您想要空值，请先处理它们。

Answer 5

回答by Frank

i use these three lines of code to print out the column names which contain at least one null value:

我使用这三行代码打印出包含至少一个空值的列名：

for column in dataframe:
    if dataframe[column].isnull().any():
       print('{0} has {1} null values'.format(column, dataframe[column].isnull().sum()))

Answer 6

回答by prosti

Both of these should work:

这两个都应该有效：

df.isnull().sum()
df.isna().sum()

DataFrame methods isna()or isnull()are completely identical.

DataFrame 方法isna()或isnull()完全相同。

Note: Empty strings ''is considered as False (not considered NA)

注意：空字符串''被视为 False（不考虑 NA）

Python 如何在 Pandas 数据框中查找哪些列包含任何 NaN 值

提问by denvar

回答by MaxU

回答by Matheus

回答by Tom Wattley

回答by Pradeep Singh

回答by Frank

回答by prosti

相关推荐

最近更新

标签

Python 如何在 Pandas 数据框中查找哪些列包含任何 NaN 值

提问by denvar

回答by MaxU

回答by Matheus

回答by Tom Wattley

回答by Pradeep Singh

回答by Frank

回答by prosti

相关推荐

JSON.stringify (Javascript) 和 json.dumps (Python) 在列表中不等价？

Python 如何按索引值从 Pandas DataFrame 中检索行？

Python opencv.imshow 会导致 jupyter notebook 崩溃

我应该如何在 iPython 笔记本中停止繁忙的单元格？

相关推荐

最近更新

标签