Pandas dropna - 存储删除的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34296292/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:23:18  来源:igfitidea点击:

Pandas dropna - store dropped rows

pythonpython-3.xpandas

提问by wesanyer

I am using the pandas.DataFrame.dropnamethod to drop rows that contain NaN. This function returns a dataframe that excludes the dropped rows, as shown in the documentation.

我正在使用pandas.DataFrame.dropna方法删除包含 NaN 的行。此函数返回一个排除删除行的数据帧,如文档中所示。

How can I store a copy of the dropped rows as a separate dataframe? Is:

如何将删除的行的副本存储为单独的数据帧?是:

mydataframe[pd.isnull(['list', 'of', 'columns'])]

always guaranteed to return the same rows that dropna drops, assuming that dropna is called with subset=['list', 'of', 'columns']?

总是保证返回 dropna 删除的相同行,假设 dropna 是用subset=['list', 'of', 'columns']?

回答by anmol

You can do this by indexing the original DataFrame by using the unary ~(invert) operatorto give the inverse of the NA free DataFrame.

您可以通过使用一元~(反转)运算符对原始 DataFrame 进行索引以提供 NA 自由 DataFrame 的反转来实现此目的。

na_free = df.dropna()
only_na = df[~df.index.isin(na_free.index)]

Another option would be to use the ufunc implementation of ~.

另一种选择是使用ufunc实施~

only_na = df[np.invert(df.index.isin(na_free.index))]

回答by johnchase

I was going to leave a comment, but figured I'd write an answer as it started getting fairly complicated. Start with the following data frame:

我打算发表评论,但我想我会写一个答案,因为它开始变得相当复杂。从以下数据框开始:

import pandas as pd
import numpy as np
df = pd.DataFrame([['a', 'b', np.nan], [np.nan, 'c', 'c'], ['c', 'd', 'a']],
              columns=['col1', 'col2', 'col3'])
df
  col1 col2 col3
0    a    b  NaN
1  NaN    c    c
2    c    d    a

And say we want to keeprows with Nans in the columns col2and col3One way to do this is the following: which is based on the answers from this post

假设我们想在列中保留带有 Nans 的行col2col3一种方法如下:这是基于这篇文章的答案

df.loc[pd.isnull(df[['col2', 'col3']]).any(axis=1)]

  col1 col2 col3
0    a    b  NaN

So this gives us the rows that would be dropped if we dropped rows with Nans in the columns of interest. To keep the columns we can run the same code, but use a ~to invert the selection

因此,如果我们在感兴趣的列中删除带有 Nans 的行,这将为我们提供将被删除的行。为了保留列,我们可以运行相同的代码,但使用 a~来反转选择

df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)]

  col1 col2 col3
1  NaN    c    c
2    c    d    a

this is equivalent to:

这相当于:

df.dropna(subset=['col2', 'col3'])

Which we can test:

我们可以测试:

df.dropna(subset=['col2', 'col3']).equals(df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)])

True

You can of course test this on your own larger dataframes but should get the same answer.

您当然可以在自己的较大数据帧上进行测试,但应该得到相同的答案。