如果同一行存在于另一个数据框中,如何删除 Pandas 数据框中的行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44706485/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:51:39  来源:igfitidea点击:

How to remove rows in a Pandas dataframe if the same row exists in another dataframe?

pythonpandas

提问by RRC

I have two dataframes:

我有两个数据框:

 df1 = row1;row2;row3
 df2 = row4;row5;row6;row2

I want my output dataframe to only contain the rows unique in df1, i.e.:

我希望我的输出数据框只包含 df1 中唯一的行,即:

df_out = row1;row3

How do I get this most efficiently?

我如何最有效地获得它?

This code does what I want, but using 2 for-loops:

这段代码做了我想要的,但使用了 2 个 for 循环:

a = pd.DataFrame({0:[1,2,3],1:[10,20,30]})
b = pd.DataFrame({0:[0,1,2,3],1:[0,1,20,3]})

match_ident = []
for i in range(0,len(a)):
    found=False
    for j in range(0,len(b)):
        if a[0][i]==b[0][j]:
            if a[1][i]==b[1][j]:
                found=True
    match_ident.append(not(found))

a = a[match_ident]

回答by jezrael

You an use mergewith parameter indicatorand outer join, queryfor filtering and then remove helper column with drop:

您可以使用merge参数indicator和外连接query进行过滤,然后使用以下命令删除辅助列drop

DataFrames are joined on all columns, so onparameter can be omit.

DataFrames 连接在所有列上,因此on可以省略参数。

print (pd.merge(a,b, indicator=True, how='outer')
         .query('_merge=="left_only"')
         .drop('_merge', axis=1))
   0   1
0  1  10
2  3  30

回答by unutbu

You could convert aand binto Indexs, then use the Index.isinmethodto determine which rows are shared in common:

您可以将a和转换bIndexs,然后使用该Index.isin方法来确定哪些行是共享的:

import pandas as pd
a = pd.DataFrame({0:[1,2,3],1:[10,20,30]})
b = pd.DataFrame({0:[0,1,2,3],1:[0,1,20,3]})

a_index = a.set_index([0,1]).index
b_index = b.set_index([0,1]).index
mask = ~a_index.isin(b_index)
result = a.loc[mask]
print(result)

yields

产量

   0   1
0  1  10
2  3  30