如何在 Pandas 中合并“(df1 & not df2)”数据框?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32676027/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to do "(df1 & not df2)" dataframe merge in pandas?
提问by GeorgeOfTheRF
I have 2 pandas dataframes df1 & df2 with common columns/keys (x,y).
我有 2 个 Pandas 数据框 df1 和 df2,它们具有公共列/键(x,y)。
I want to merge do a "(df1 & not df2)" kind of merge on keys (x,y), meaning I want my code to return a dataframe containing rows with (x,y) only in df1 & not in df2.
我想合并对键 (x,y) 执行“(df1 & not df2)”类型的合并,这意味着我希望我的代码返回一个数据框,其中包含仅在 df1 中而不在 df2 中的 (x,y) 行。
SAS has an equivalent functionality
SAS 具有等效的功能
data final;
merge df1(in=a) df2(in=b);
by x y;
if a & not b;
run;
Who to replicate the same functionality in pandas elegantly? It would have been great if we can specify how="left-right" in merge().
谁来优雅地复制 Pandas 中的相同功能?如果我们可以在 merge() 中指定 how="left-right" 就太好了。
回答by GeorgeOfTheRF
I just upgraded to version 0.17.0 RC1 which was released 10 days ago. Just found out that pd.merge() have new argument in this new release called indicator=True to acheive this in pandonic way!!
我刚刚升级到 10 天前发布的 0.17.0 RC1 版本。刚刚发现 pd.merge() 在这个名为 indicator=True 的新版本中有新的参数,可以以 pandonic 的方式实现这一点!!
df=pd.merge(df1,df2,on=['x','y'],how="outer",indicator=True)
df=df[df['_merge']=='left_only']
indicator: Add a column to the output DataFrame called _merge with information on the source of each row. _merge is Categorical-type and takes on a value of left_only for observations whose merge key only appears in 'left' DataFrame, right_only for observations whose merge key only appears in 'right' DataFrame, and both if the observation's merge key is found in both.
指标:将一列添加到名为 _merge 的输出 DataFrame 中,其中包含有关每行源的信息。_merge 是 Categorical 类型,对于合并键仅出现在“左”DataFrame 中的观察值采用 left_only 值,对于合并键仅出现在“right”DataFrame 中的观察值采用 right_only,如果在两者中都找到了观察值的合并键,则两者都采用.

