如果同一行存在于另一个数据框中,如何删除 Pandas 数据框中的行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44706485/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove rows in a Pandas dataframe if the same row exists in another dataframe?
提问by RRC
I have two dataframes:
我有两个数据框:
df1 = row1;row2;row3
df2 = row4;row5;row6;row2
I want my output dataframe to only contain the rows unique in df1, i.e.:
我希望我的输出数据框只包含 df1 中唯一的行,即:
df_out = row1;row3
How do I get this most efficiently?
我如何最有效地获得它?
This code does what I want, but using 2 for-loops:
这段代码做了我想要的,但使用了 2 个 for 循环:
a = pd.DataFrame({0:[1,2,3],1:[10,20,30]})
b = pd.DataFrame({0:[0,1,2,3],1:[0,1,20,3]})
match_ident = []
for i in range(0,len(a)):
found=False
for j in range(0,len(b)):
if a[0][i]==b[0][j]:
if a[1][i]==b[1][j]:
found=True
match_ident.append(not(found))
a = a[match_ident]
回答by jezrael
You an use merge
with parameter indicator
and outer join, query
for filtering and then remove helper column with drop
:
您可以使用merge
参数indicator
和外连接query
进行过滤,然后使用以下命令删除辅助列drop
:
DataFrames are joined on all columns, so on
parameter can be omit.
DataFrames 连接在所有列上,因此on
可以省略参数。
print (pd.merge(a,b, indicator=True, how='outer')
.query('_merge=="left_only"')
.drop('_merge', axis=1))
0 1
0 1 10
2 3 30
回答by unutbu
You could convert a
and b
into Index
s, then use the Index.isin
methodto determine which rows are shared in common:
您可以将a
和转换b
为Index
s,然后使用该Index.isin
方法来确定哪些行是共享的:
import pandas as pd
a = pd.DataFrame({0:[1,2,3],1:[10,20,30]})
b = pd.DataFrame({0:[0,1,2,3],1:[0,1,20,3]})
a_index = a.set_index([0,1]).index
b_index = b.set_index([0,1]).index
mask = ~a_index.isin(b_index)
result = a.loc[mask]
print(result)
yields
产量
0 1
0 1 10
2 3 30