在 Pandas Dataframe 中查找空或 NaN 条目
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27159189/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find empty or NaN entry in Pandas Dataframe
提问by edesz
I am trying to search through a Pandas Dataframe to find where it has a missing entry or a NaN entry.
我正在尝试搜索 Pandas 数据框以查找缺少条目或 NaN 条目的位置。
Here is a dataframe that I am working with:
这是我正在使用的数据框:
cl_id a c d e A1 A2 A3
0 1 -0.419279 0.843832 -0.530827 text76 1.537177 -0.271042
1 2 0.581566 2.257544 0.440485 dafN_6 0.144228 2.362259
2 3 -1.259333 1.074986 1.834653 system 1.100353
3 4 -1.279785 0.272977 0.197011 Fifty -0.031721 1.434273
4 5 0.578348 0.595515 0.553483 channel 0.640708 0.649132
5 6 -1.549588 -0.198588 0.373476 audio -0.508501
6 7 0.172863 1.874987 1.405923 Twenty NaN NaN
7 8 -0.149630 -0.502117 0.315323 file_max NaN NaN
NOTE: The blank entries are empty strings - this is because there was no alphanumeric content in the file that the dataframe came from.
注意:空白条目是空字符串 - 这是因为数据帧来自的文件中没有字母数字内容。
If I have this dataframe, how can I find a list with the indexes where the NaN or blank entry occurs?
如果我有这个数据框,我怎样才能找到一个包含 NaN 或空白条目出现的索引的列表?
回答by unutbu
np.where(pd.isnull(df))
returns the row and column indices where the value is NaN:
np.where(pd.isnull(df))
返回值为 NaN 的行和列索引:
In [152]: import numpy as np
In [153]: import pandas as pd
In [154]: np.where(pd.isnull(df))
Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))
In [155]: df.iloc[2,7]
Out[155]: nan
In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]
Out[160]: [nan, nan, nan, nan, nan, nan]
Finding values which are empty strings could be done with applymap:
可以使用 applymap 查找空字符串的值:
In [182]: np.where(df.applymap(lambda x: x == ''))
Out[182]: (array([5]), array([7]))
Note that using applymap
requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull
.
请注意,使用applymap
需要为 DataFrame 的每个单元格调用一次 Python 函数。对于大型 DataFrame 来说,这可能会很慢,所以如果您可以安排所有空白单元格包含 NaN 会更好,这样您就可以使用pd.isnull
.
回答by Vyachez
Try this:
尝试这个:
df[df['column_name'] == ''].index
and for NaNs you can try:
对于 NaN,您可以尝试:
pd.isna(df['column_name'])
回答by jeremy_rutman
I've resorted to
我采取了
df[ (df[column_name].notnull()) & (df[column_name]!=u'') ].index
df[ (df[column_name].notnull()) & (df[column_name]!=u'') ].index
lately. That gets both null and empty-string cells in one go.
最近。一次性获得空和空字符串单元格。
回答by lahoffm
Partial solution: for a single string column
tmp = df['A1'].fillna(''); isEmpty = tmp==''
gives boolean Series of True where there are empty strings or NaN values.
部分解决方案:对于单个字符串列
tmp = df['A1'].fillna(''); isEmpty = tmp==''
,在有空字符串或 NaN 值的情况下给出布尔系列 True。
回答by Shara
To obtain all the rows that contains an empty cell in in a particular column.
获取特定列中包含空单元格的所有行。
DF_new_row=DF_raw.loc[DF_raw['columnname']=='']
This will give the subset of DF_raw, which satisfy the checking condition.
这将给出满足检查条件的 DF_raw 子集。
回答by Alexander
Check if the columns contain Nan
using .isnull()
and check for empty strings using .eq('')
, then join the two together using the bitwise OR operator |
.
检查列是否包含Nan
using.isnull()
并检查空字符串 using .eq('')
,然后使用按位 OR 运算符将两者连接在一起|
。
Sum along axis 0
to find columns with missing data, then sum along axis 1
to the index locations for rows with missing data.
求和axis 0
以查找具有缺失数据的列,然后求和axis 1
到包含缺失数据的行的索引位置。
missing_cols, missing_rows = (
(df2.isnull().sum(x) | df2.eq('').sum(x))
.loc[lambda x: x.gt(0)].index
for x in (0, 1)
)
>>> df2.loc[missing_rows, missing_cols]
A2 A3
2 1.10035
5 -0.508501
6 NaN NaN
7 NaN NaN
回答by saias
Another opltion covering cases where there might be severar spaces is by using the isspace()
python function.
涵盖可能存在多个空格的情况的另一个选项是使用isspace()
python 函数。
df[df.col_name.apply(lambda x:x.isspace() == False] # will only return cases without empty spaces
adding nan values
添加 nan 值
df[(df.col_name.apply(lambda x:x.isspace() == False) & (~df.col_name.isna())]
回答by Mohamed Abdelsalam
you also do something good:
你也做一些好事:
text_empty = df['column name'].str.len() > -1
text_empty = df['column name'].str.len() > -1
df.loc[text_empty].index
df.loc[text_empty].index
The results will be the rows which are empty & it's index number.
结果将是空的行及其索引号。