在 Pandas Dataframe 中查找空或 NaN 条目

Question

提问by edesz

I am trying to search through a Pandas Dataframe to find where it has a missing entry or a NaN entry.

我正在尝试搜索 Pandas 数据框以查找缺少条目或 NaN 条目的位置。

Here is a dataframe that I am working with:

这是我正在使用的数据框：

cl_id       a           c         d         e        A1              A2             A3
    0       1   -0.419279  0.843832 -0.530827    text76        1.537177      -0.271042
    1       2    0.581566  2.257544  0.440485    dafN_6        0.144228       2.362259
    2       3   -1.259333  1.074986  1.834653    system                       1.100353
    3       4   -1.279785  0.272977  0.197011     Fifty       -0.031721       1.434273
    4       5    0.578348  0.595515  0.553483   channel        0.640708       0.649132
    5       6   -1.549588 -0.198588  0.373476     audio       -0.508501               
    6       7    0.172863  1.874987  1.405923    Twenty             NaN            NaN
    7       8   -0.149630 -0.502117  0.315323  file_max             NaN            NaN

NOTE: The blank entries are empty strings - this is because there was no alphanumeric content in the file that the dataframe came from.

注意：空白条目是空字符串 - 这是因为数据帧来自的文件中没有字母数字内容。

If I have this dataframe, how can I find a list with the indexes where the NaN or blank entry occurs?

如果我有这个数据框，我怎样才能找到一个包含 NaN 或空白条目出现的索引的列表？

Answer 1

回答by unutbu

np.where(pd.isnull(df))returns the row and column indices where the value is NaN:

np.where(pd.isnull(df))返回值为 NaN 的行和列索引：

In [152]: import numpy as np
In [153]: import pandas as pd
In [154]: np.where(pd.isnull(df))
Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))

In [155]: df.iloc[2,7]
Out[155]: nan

In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]
Out[160]: [nan, nan, nan, nan, nan, nan]

Finding values which are empty strings could be done with applymap:

可以使用 applymap 查找空字符串的值：

In [182]: np.where(df.applymap(lambda x: x == ''))
Out[182]: (array([5]), array([7]))

Note that using applymaprequires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull.

请注意，使用applymap需要为 DataFrame 的每个单元格调用一次 Python 函数。对于大型 DataFrame 来说，这可能会很慢，所以如果您可以安排所有空白单元格包含 NaN 会更好，这样您就可以使用pd.isnull.

Answer 2

回答by Vyachez

Try this:

尝试这个：

df[df['column_name'] == ''].index

and for NaNs you can try:

对于 NaN，您可以尝试：

pd.isna(df['column_name'])

Answer 3

回答by jeremy_rutman

I've resorted to

我采取了

df[ (df[column_name].notnull()) & (df[column_name]!=u'') ].index

lately. That gets both null and empty-string cells in one go.

最近。一次性获得空和空字符串单元格。

Answer 4

回答by lahoffm

Partial solution: for a single string column tmp = df['A1'].fillna(''); isEmpty = tmp==''gives boolean Series of True where there are empty strings or NaN values.

部分解决方案：对于单个字符串列 tmp = df['A1'].fillna(''); isEmpty = tmp==''，在有空字符串或 NaN 值的情况下给出布尔系列 True。

Answer 5

回答by Shara

To obtain all the rows that contains an empty cell in in a particular column.

获取特定列中包含空单元格的所有行。

DF_new_row=DF_raw.loc[DF_raw['columnname']=='']

This will give the subset of DF_raw, which satisfy the checking condition.

这将给出满足检查条件的 DF_raw 子集。

Answer 6

回答by Alexander

Check if the columns contain Nanusing .isnull()and check for empty strings using .eq(''), then join the two together using the bitwise OR operator |.

检查列是否包含Nanusing.isnull()并检查空字符串 using .eq('')，然后使用按位 OR 运算符将两者连接在一起|。

Sum along axis 0to find columns with missing data, then sum along axis 1to the index locations for rows with missing data.

求和axis 0以查找具有缺失数据的列，然后求和axis 1到包含缺失数据的行的索引位置。

missing_cols, missing_rows = (
    (df2.isnull().sum(x) | df2.eq('').sum(x))
    .loc[lambda x: x.gt(0)].index
    for x in (0, 1)
)

>>> df2.loc[missing_rows, missing_cols]
         A2       A3
2            1.10035
5 -0.508501         
6       NaN      NaN
7       NaN      NaN

Answer 7

回答by saias

Another opltion covering cases where there might be severar spaces is by using the isspace()python function.

涵盖可能存在多个空格的情况的另一个选项是使用isspace()python 函数。

df[df.col_name.apply(lambda x:x.isspace() == False] # will only return cases without empty spaces

adding nan values

添加 nan 值

df[(df.col_name.apply(lambda x:x.isspace() == False) & (~df.col_name.isna())]

Answer 8

回答by Mohamed Abdelsalam

you also do something good:

你也做一些好事：

text_empty = df['column name'].str.len() > -1

df.loc[text_empty].index

The results will be the rows which are empty & it's index number.

结果将是空的行及其索引号。

在 Pandas Dataframe 中查找空或 NaN 条目

提问by edesz

回答by unutbu

回答by Vyachez

回答by jeremy_rutman

回答by lahoffm

回答by Shara

回答by Alexander

回答by saias

回答by Mohamed Abdelsalam

相关推荐

最近更新

标签

在 Pandas Dataframe 中查找空或 NaN 条目

提问by edesz

回答by unutbu

回答by Vyachez

回答by jeremy_rutman

回答by lahoffm

回答by Shara

回答by Alexander

回答by saias

回答by Mohamed Abdelsalam

相关推荐

如何使 Excel VBA 变量可用于多个宏？

vba 在Excel中选择ActiveCell行的第1到10列

vba 在 excel 2010 的下拉列表中创建一个复选框

如何检查哪一行 VBA 代码导致错误

相关推荐

最近更新

标签