Python 如何在不明确列出列的情况下从 Pandas DataFrame 中选择具有一个或多个空值的行？

Question

提问by Lev Selector

I have a dataframe with ~300K rows and ~40 columns. I want to find out if any rows contain null values - and put these 'null'-rows into a separate dataframe so that I could explore them easily.

我有一个约 300K 行和约 40 列的数据框。我想找出是否有任何行包含空值 - 并将这些“空”行放入单独的数据框中，以便我可以轻松地探索它们。

I can create a mask explicitly:

我可以明确地创建一个掩码：

mask = False
for col in df.columns: 
    mask = mask | df[col].isnull()
dfnulls = df[mask]

Or I can do something like:

或者我可以这样做：

df.ix[df.index[(df.T == np.nan).sum() > 1]]

Is there a more elegant way of doing it (locating rows with nulls in them)?

有没有更优雅的方法（定位包含空值的行）？

Answer 1

回答by DSM

[Updated to adapt to modern pandas, which has isnullas a method of DataFrames..]

[更新以适应现代pandas，它具有isnull作为DataFrames..的方法]

You can use isnulland anyto build a boolean Series and use that to index into your frame:

您可以使用isnull和any构建一个布尔系列并使用它来索引您的框架：

>>> df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)])
>>> df.isnull()
       0      1      2
0  False  False  False
1  False   True  False
2  False  False   True
3  False  False  False
4  False  False  False
>>> df.isnull().any(axis=1)
0    False
1     True
2     True
3    False
4    False
dtype: bool
>>> df[df.isnull().any(axis=1)]
   0   1   2
1  0 NaN   0
2  0   0 NaN

[For older pandas:]

[对于老年人pandas：]

You could use the function isnullinstead of the method:

您可以使用函数isnull而不是方法：

In [56]: df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)])

In [57]: df
Out[57]: 
   0   1   2
0  0   1   2
1  0 NaN   0
2  0   0 NaN
3  0   1   2
4  0   1   2

In [58]: pd.isnull(df)
Out[58]: 
       0      1      2
0  False  False  False
1  False   True  False
2  False  False   True
3  False  False  False
4  False  False  False

In [59]: pd.isnull(df).any(axis=1)
Out[59]: 
0    False
1     True
2     True
3    False
4    False

leading to the rather compact:

导致相当紧凑：

In [60]: df[pd.isnull(df).any(axis=1)]
Out[60]: 
   0   1   2
1  0 NaN   0
2  0   0 NaN

Answer 2

回答by Roko Mijic

def nans(df): return df[df.isnull().any(axis=1)]

then when ever you need it you can type:

然后当你需要它时，你可以输入：

nans(your_dataframe)

Python 如何在不明确列出列的情况下从 Pandas DataFrame 中选择具有一个或多个空值的行？

提问by Lev Selector

回答by DSM

回答by Roko Mijic

相关推荐

最近更新

标签

Python 如何在不明确列出列的情况下从 Pandas DataFrame 中选择具有一个或多个空值的行？

提问by Lev Selector

回答by DSM

回答by Roko Mijic

相关推荐

将值附加到 Python 中的集合

Python：如何在 if 语句中使用 RegEx？

Python 每 n 秒运行一次特定代码

Python 将 DataFrame 列表保存到多表 Excel 电子表格

相关推荐

最近更新

标签