Python Pandas 比较两个数据框并删除一列中的匹配项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34417964/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas compare two dataframes and remove what matches in one column
提问by GNMO11
I have two separate pandas dataframes (df1and df2) which have multiple columns, but only one in common ('text').
我有两个单独的 Pandas 数据框(df1和df2),它们有多个列,但只有一个公共(“文本”)。
I would like to do find every row in df2that does not have a match in any of the rows of the column that df2and df1have in common.
我想找到其中的每一行在df2该列的任何行中都没有匹配项,df2并且df1具有共同点。
df1
df1
A B text
45 2 score
33 5 miss
20 1 score
df2
df2
C D text
.5 2 shot
.3 2 shot
.3 1 miss
Result df (remove row containing miss since it occurs in df1)
结果 df(删除包含未命中的行,因为它发生在 df1 中)
C D text
.5 2 shot
.3 2 shot
Is it possible to use the isinmethod in this scenario?
isin在这种情况下是否可以使用该方法?
采纳答案by Ami Tavory
As you asked, you can do this efficiently using isin(without resorting to expensive merges).
正如您所问的那样,您可以有效地使用isin(无需求助于昂贵的merges)。
>>> df2[~df2.text.isin(df1.text.values)]
C D text
0 0.5 2 shot
1 0.3 2 shot
回答by Shahram
EDIT:
编辑:
import numpy as np
mergeddf = pd.merge(df2,df1, how="left")
result = mergeddf[(np.isnan(mergeddf['A']))][['C','D','text']]
回答by Julien Spronck
You can merge them and keep only the lines that have a NaN.
您可以合并它们并仅保留具有 NaN 的行。
df2[pd.merge(df1, df2, how='outer').isnull().any(axis=1)]
or you can use isin:
或者你可以使用isin:
df2[~df2.text.isin(df1.text)]

