pandas 获取所有具有 NaN 值的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21202652/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:35:19  来源:igfitidea点击:

Getting all rows with NaN value

pythonnumpypandas

提问by MJP

I have a table with a column that has some NaN values in it:

我有一个表,其中有一列包含一些 NaN 值:

A   B   C   D
2   3   2   Nan
3   4   5   5
2   3   1   Nan

I'd like to get all rows where D = NaN. How can I do this?

我想获得 D = NaN 的所有行。我怎样才能做到这一点?

回答by Nipun Batra

Creating a df for illustration (containing Nan)

为插图创建一个 df(包含 Nan)

In [86]: df =pd.DataFrame({'a':[1,2,3],'b':[3,4,5],'c':[np.nan, 4,5]})

In [87]: df
Out[87]: 
   a  b   c
0  1  3 NaN
1  2  4   4
2  3  5   5

Checking which indices have null for column c

检查列 c 的哪些索引为空

In [88]: pd.isnull(df['c'])
Out[88]: 
0     True
1    False
2    False
Name: c, dtype: bool

Checking which indices dont have null for column c

检查列 c 的哪些索引不为空

In [90]: pd.notnull(df['c'])
Out[90]: 
0    False
1     True
2     True
Name: c, dtype: bool

Selecting rows of df where c is not null

选择 df 的行,其中 c 不为空

In [91]: df[pd.notnull(df['c'])]
Out[91]: 
   a  b  c
1  2  4  4
2  3  5  5

Selecting rows of df where c is null

选择 df 的行,其中 c 为空

In [93]: df[pd.isnull(df['c'])]
Out[93]: 
   a  b   c
0  1  3 NaN

Selecting rows of column c of df where c is not null

选择 df 的 c 列的行,其中 c 不为空

In [94]: df['c'][pd.notnull(df['c'])]
Out[94]: 
1    4
2    5
Name: c, dtype: float64

回答by Vincenzooo

For a solution that doesn't involve pandas, you can do something like:

对于不涉及Pandas的解决方案,您可以执行以下操作:

goodind=np.where(np.sum(np.isnan(y),axis=1)==0)[0] #indices of rows non containing nans

(or the negation if you want rows with nan) and use the indices to slice data. I am not sure sumis the best way to combine booleans, but np.anyand np.alldon't seem to have a axisparameter, so this is the best way I found.

(或者如果你想要带有 nan 的行)并使用索引来切片数据。我不知道sum是布尔值相结合的最佳方式,但np.anynp.all似乎没有有一个axis参数,所以这是我找到的最好方式。