当特定列在 Pandas 中有空值时选择数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40408471/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:20:39  来源:igfitidea点击:

Select data when specific columns have null value in pandas

pythonpandas

提问by yasin mohammed

I have a dataframe where there are 2 date fields I want to filter and see rows when any one of the date field is null.

我有一个数据框,其中有 2 个日期字段,当任何一个日期字段为空时,我想过滤并查看行。

ID          Date1       Date2
58844880    04/11/16    NaN
59745846    04/12/16    04/14/16
59743311    04/13/16    NaN
59745848    04/14/16    04/11/16
59598413    NaN         NaN
59745921    04/14/16    04/14/16
59561199    04/15/16    04/15/16
NaN         04/16/16    04/16/16
59561198    NaN         04/17/16

It should look like below

它应该如下所示

ID          Date1       Date2
58844880    04/11/16    NaN
59743311    04/13/16    NaN
59598413    NaN         NaN
59561198    NaN         04/17/16

Tried the code df = (df['Date1'].isnull() | df['Date1'].isnull())

试过代码 df = (df['Date1'].isnull() | df['Date1'].isnull())

回答by jezrael

Use boolean indexing:

使用boolean indexing

mask = df['Date1'].isnull() | df['Date2'].isnull()
print (df[mask])
           ID     Date1     Date2
0  58844880.0  04/11/16       NaN
2  59743311.0  04/13/16       NaN
4  59598413.0       NaN       NaN
8  59561198.0       NaN  04/17/16

Timings:

时间

#[900000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [12]: %timeit (df[df['Date1'].isnull() | df['Date2'].isnull()])
10 loops, best of 3: 89.3 ms per loop

In [13]: %timeit (df[df.filter(like='Date').isnull().any(1)])
10 loops, best of 3: 146 ms per loop

回答by MaxU

try this:

尝试这个:

In [7]: df[df.filter(like='Date').isnull().any(1)]
Out[7]:
           ID     Date1     Date2
0  58844880.0  04/11/16       NaN
2  59743311.0  04/13/16       NaN
4  59598413.0       NaN       NaN
8  59561198.0       NaN  04/17/16

回答by markling

Quickly see if either column has any null values

快速查看任一列是否有任何空值

df.isnull().any()

Count rows that have any null values

计算具有任何空值的行

df.isnull().sum()

Get rows with null values

获取具有空值的行

(1) Create truth table of null values (i.e. create dataframe with True/False in each column/cell, according to whether it has null value)

(1) 创建空值真值表(即在每一列/单元格中创建带有True/False的dataframe,根据它是否有空值)

truth_table = df.isnull()

(2) Create truth table that shows conclusively which rows have any null values

(2) 创建真值表,最终显示哪些行具有任何空值

conclusive_truth_table = truth_table.any(axis='columns')

(3) isolate/show rows that have any null values

(3) 隔离/显示具有任何空值的行

df[conclusive_truth_table]

(1)-(3) put it all together

(1)-(3) 把它们放在一起

df[df.isnull().any(axis='columns')]

Alternatively

或者

Isolate rows that have null values in any specified column

隔离在任何指定列中具有空值的行

df.loc[:,['Date1','Date2']].isnull().any(axis='columns')

Isolate rows that have null values in BOTH specified columns

隔离在指定列中具有空值的行

df[ df.loc[ :,['Date1','Date2'] ].isnull().sum(axis=1) == 2]