当特定列在 Pandas 中有空值时选择数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40408471/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Select data when specific columns have null value in pandas
提问by yasin mohammed
I have a dataframe where there are 2 date fields I want to filter and see rows when any one of the date field is null.
我有一个数据框,其中有 2 个日期字段,当任何一个日期字段为空时,我想过滤并查看行。
ID Date1 Date2
58844880 04/11/16 NaN
59745846 04/12/16 04/14/16
59743311 04/13/16 NaN
59745848 04/14/16 04/11/16
59598413 NaN NaN
59745921 04/14/16 04/14/16
59561199 04/15/16 04/15/16
NaN 04/16/16 04/16/16
59561198 NaN 04/17/16
It should look like below
它应该如下所示
ID Date1 Date2
58844880 04/11/16 NaN
59743311 04/13/16 NaN
59598413 NaN NaN
59561198 NaN 04/17/16
Tried the code
df = (df['Date1'].isnull() | df['Date1'].isnull())
试过代码
df = (df['Date1'].isnull() | df['Date1'].isnull())
回答by jezrael
Use boolean indexing
:
mask = df['Date1'].isnull() | df['Date2'].isnull()
print (df[mask])
ID Date1 Date2
0 58844880.0 04/11/16 NaN
2 59743311.0 04/13/16 NaN
4 59598413.0 NaN NaN
8 59561198.0 NaN 04/17/16
Timings:
时间:
#[900000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
In [12]: %timeit (df[df['Date1'].isnull() | df['Date2'].isnull()])
10 loops, best of 3: 89.3 ms per loop
In [13]: %timeit (df[df.filter(like='Date').isnull().any(1)])
10 loops, best of 3: 146 ms per loop
回答by MaxU
try this:
尝试这个:
In [7]: df[df.filter(like='Date').isnull().any(1)]
Out[7]:
ID Date1 Date2
0 58844880.0 04/11/16 NaN
2 59743311.0 04/13/16 NaN
4 59598413.0 NaN NaN
8 59561198.0 NaN 04/17/16
回答by markling
Quickly see if either column has any null values
快速查看任一列是否有任何空值
df.isnull().any()
Count rows that have any null values
计算具有任何空值的行
df.isnull().sum()
Get rows with null values
获取具有空值的行
(1) Create truth table of null values (i.e. create dataframe with True/False in each column/cell, according to whether it has null value)
(1) 创建空值真值表(即在每一列/单元格中创建带有True/False的dataframe,根据它是否有空值)
truth_table = df.isnull()
(2) Create truth table that shows conclusively which rows have any null values
(2) 创建真值表,最终显示哪些行具有任何空值
conclusive_truth_table = truth_table.any(axis='columns')
(3) isolate/show rows that have any null values
(3) 隔离/显示具有任何空值的行
df[conclusive_truth_table]
(1)-(3) put it all together
(1)-(3) 把它们放在一起
df[df.isnull().any(axis='columns')]
Alternatively
或者
Isolate rows that have null values in any specified column
隔离在任何指定列中具有空值的行
df.loc[:,['Date1','Date2']].isnull().any(axis='columns')
Isolate rows that have null values in BOTH specified columns
隔离在指定列中具有空值的行
df[ df.loc[ :,['Date1','Date2'] ].isnull().sum(axis=1) == 2]