Pandas dropna - 存储删除的行

Question

提问by wesanyer

I am using the pandas.DataFrame.dropnamethod to drop rows that contain NaN. This function returns a dataframe that excludes the dropped rows, as shown in the documentation.

我正在使用pandas.DataFrame.dropna方法删除包含 NaN 的行。此函数返回一个排除删除行的数据帧，如文档中所示。

How can I store a copy of the dropped rows as a separate dataframe? Is:

如何将删除的行的副本存储为单独的数据帧？是：

mydataframe[pd.isnull(['list', 'of', 'columns'])]

always guaranteed to return the same rows that dropna drops, assuming that dropna is called with subset=['list', 'of', 'columns']?

总是保证返回 dropna 删除的相同行，假设 dropna 是用subset=['list', 'of', 'columns']?

Answer 1

回答by anmol

You can do this by indexing the original DataFrame by using the unary ~(invert) operatorto give the inverse of the NA free DataFrame.

您可以通过使用一元~（反转）运算符对原始 DataFrame 进行索引以提供 NA 自由 DataFrame 的反转来实现此目的。

na_free = df.dropna()
only_na = df[~df.index.isin(na_free.index)]

Another option would be to use the ufunc implementation of ~.

另一种选择是使用ufunc实施~。

only_na = df[np.invert(df.index.isin(na_free.index))]

Answer 2

回答by johnchase

I was going to leave a comment, but figured I'd write an answer as it started getting fairly complicated. Start with the following data frame:

我打算发表评论，但我想我会写一个答案，因为它开始变得相当复杂。从以下数据框开始：

import pandas as pd
import numpy as np
df = pd.DataFrame([['a', 'b', np.nan], [np.nan, 'c', 'c'], ['c', 'd', 'a']],
              columns=['col1', 'col2', 'col3'])
df
  col1 col2 col3
0    a    b  NaN
1  NaN    c    c
2    c    d    a

And say we want to keeprows with Nans in the columns col2and col3One way to do this is the following: which is based on the answers from this post

假设我们想在列中保留带有 Nans 的行col2，col3一种方法如下：这是基于这篇文章的答案

df.loc[pd.isnull(df[['col2', 'col3']]).any(axis=1)]

  col1 col2 col3
0    a    b  NaN

So this gives us the rows that would be dropped if we dropped rows with Nans in the columns of interest. To keep the columns we can run the same code, but use a ~to invert the selection

因此，如果我们在感兴趣的列中删除带有 Nans 的行，这将为我们提供将被删除的行。为了保留列，我们可以运行相同的代码，但使用 a~来反转选择

df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)]

  col1 col2 col3
1  NaN    c    c
2    c    d    a

this is equivalent to:

这相当于：

df.dropna(subset=['col2', 'col3'])

Which we can test:

我们可以测试：

df.dropna(subset=['col2', 'col3']).equals(df.loc[~pd.isnull(df[['col2', 'col3']]).any(axis=1)])

True

You can of course test this on your own larger dataframes but should get the same answer.

您当然可以在自己的较大数据帧上进行测试，但应该得到相同的答案。

Pandas dropna - 存储删除的行

提问by wesanyer

回答by anmol

回答by johnchase

相关推荐

最近更新

标签

Pandas dropna - 存储删除的行

提问by wesanyer

回答by anmol

回答by johnchase

相关推荐

pandas 如何在熊猫中做两个数据帧的矩阵乘积？

pandas 从不同的列中取绝对值的最大值并过滤掉 NaN Python

pandas 以绝对值对熊猫系列进行排序

Pandas GroupBy：如何根据列获取前 n 个值

相关推荐

最近更新

标签