Python Pandas 比较两个数据框并删除一列中的匹配项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34417964/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:58:01  来源:igfitidea点击:

Pandas compare two dataframes and remove what matches in one column

pythonpandas

提问by GNMO11

I have two separate pandas dataframes (df1and df2) which have multiple columns, but only one in common ('text').

我有两个单独的 Pandas 数据框(df1df2),它们有多个列,但只有一个公共(“文本”)。

I would like to do find every row in df2that does not have a match in any of the rows of the column that df2and df1have in common.

我想找到其中的每一行在df2该列的任何行中都没有匹配项,df2并且df1具有共同点。

df1

df1

A    B    text
45   2    score
33   5    miss
20   1    score

df2

df2

C    D    text
.5   2    shot
.3   2    shot
.3   1    miss

Result df (remove row containing miss since it occurs in df1)

结果 df(删除包含未命中的行,因为它发生在 df1 中)

C    D    text
.5   2    shot
.3   2    shot

Is it possible to use the isinmethod in this scenario?

isin在这种情况下是否可以使用该方法?

采纳答案by Ami Tavory

As you asked, you can do this efficiently using isin(without resorting to expensive merges).

正如您所问的那样,您可以有效地使用isin(无需求助于昂贵的merges)。

>>> df2[~df2.text.isin(df1.text.values)]
C   D   text
0   0.5 2   shot
1   0.3 2   shot

回答by Shahram

EDIT:

编辑:

import numpy as np

mergeddf = pd.merge(df2,df1, how="left")

result = mergeddf[(np.isnan(mergeddf['A']))][['C','D','text']]

回答by Julien Spronck

You can merge them and keep only the lines that have a NaN.

您可以合并它们并仅保留具有 NaN 的行。

df2[pd.merge(df1, df2, how='outer').isnull().any(axis=1)]

or you can use isin:

或者你可以使用isin

df2[~df2.text.isin(df1.text)]