Python Pandas 比较两个数据框并删除一列中的匹配项

Question

提问by GNMO11

I have two separate pandas dataframes (df1and df2) which have multiple columns, but only one in common ('text').

我有两个单独的 Pandas 数据框（df1和df2），它们有多个列，但只有一个公共（“文本”）。

I would like to do find every row in df2that does not have a match in any of the rows of the column that df2and df1have in common.

我想找到其中的每一行在df2该列的任何行中都没有匹配项，df2并且df1具有共同点。

df1

A    B    text
45   2    score
33   5    miss
20   1    score

df2

C    D    text
.5   2    shot
.3   2    shot
.3   1    miss

Result df (remove row containing miss since it occurs in df1)

结果 df（删除包含未命中的行，因为它发生在 df1 中）

C    D    text
.5   2    shot
.3   2    shot

Is it possible to use the isinmethod in this scenario?

isin在这种情况下是否可以使用该方法？

Answer 1

As you asked, you can do this efficiently using isin(without resorting to expensive merges).

正如您所问的那样，您可以有效地使用isin（无需求助于昂贵的merges）。

>>> df2[~df2.text.isin(df1.text.values)]
C   D   text
0   0.5 2   shot
1   0.3 2   shot

Answer 2

EDIT:

编辑：

import numpy as np

mergeddf = pd.merge(df2,df1, how="left")

result = mergeddf[(np.isnan(mergeddf['A']))][['C','D','text']]

Answer 3

You can merge them and keep only the lines that have a NaN.

您可以合并它们并仅保留具有 NaN 的行。

df2[pd.merge(df1, df2, how='outer').isnull().any(axis=1)]

or you can use isin:

或者你可以使用isin：

df2[~df2.text.isin(df1.text)]