Python Pandas 比较两个数据框并删除一列中的匹配项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34417964/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas compare two dataframes and remove what matches in one column
提问by GNMO11
I have two separate pandas dataframes (df1
and df2
) which have multiple columns, but only one in common ('text').
我有两个单独的 Pandas 数据框(df1
和df2
),它们有多个列,但只有一个公共(“文本”)。
I would like to do find every row in df2
that does not have a match in any of the rows of the column that df2
and df1
have in common.
我想找到其中的每一行在df2
该列的任何行中都没有匹配项,df2
并且df1
具有共同点。
df1
df1
A B text
45 2 score
33 5 miss
20 1 score
df2
df2
C D text
.5 2 shot
.3 2 shot
.3 1 miss
Result df (remove row containing miss since it occurs in df1)
结果 df(删除包含未命中的行,因为它发生在 df1 中)
C D text
.5 2 shot
.3 2 shot
Is it possible to use the isin
method in this scenario?
isin
在这种情况下是否可以使用该方法?
采纳答案by Ami Tavory
As you asked, you can do this efficiently using isin
(without resorting to expensive merge
s).
正如您所问的那样,您可以有效地使用isin
(无需求助于昂贵的merge
s)。
>>> df2[~df2.text.isin(df1.text.values)]
C D text
0 0.5 2 shot
1 0.3 2 shot
回答by Shahram
EDIT:
编辑:
import numpy as np
mergeddf = pd.merge(df2,df1, how="left")
result = mergeddf[(np.isnan(mergeddf['A']))][['C','D','text']]
回答by Julien Spronck
You can merge them and keep only the lines that have a NaN.
您可以合并它们并仅保留具有 NaN 的行。
df2[pd.merge(df1, df2, how='outer').isnull().any(axis=1)]
or you can use isin
:
或者你可以使用isin
:
df2[~df2.text.isin(df1.text)]