在 Pandas 中,如何从基于另一个数据帧的数据帧中删除行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39880627/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:08:47  来源:igfitidea点击:

In Pandas, how to delete rows from a Data Frame based on another Data Frame?

pythonpandas

提问by Vini

I have 2 Data Frames, one named USERS and another named EXCLUDE. Both of them have a field named "email".

我有 2 个数据帧,一个名为 USERS,另一个名为 EXCLUDE。他们都有一个名为“电子邮件”的字段。

Basically, I want to remove every row in USERS that has an email contained in EXCLUDE.

基本上,我想删除 USERS 中包含在 EXCLUDE 中的电子邮件的每一行。

How can I do it?

我该怎么做?

回答by jezrael

You can use boolean indexingand condition with isin, inverting boolean Seriesis by ~:

您可以使用boolean indexing和条件isin,反转布尔值Series~

import pandas as pd

USERS = pd.DataFrame({'email':['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]']})
print (USERS)
     email
0  [email protected]
1  [email protected]
2  [email protected]
3  [email protected]
4  [email protected]

EXCLUDE = pd.DataFrame({'email':['[email protected]','[email protected]']})
print (EXCLUDE)
     email
0  [email protected]
1  [email protected]
print (USERS.email.isin(EXCLUDE.email))
0     True
1    False
2    False
3    False
4     True
Name: email, dtype: bool

print (~USERS.email.isin(EXCLUDE.email))
0    False
1     True
2     True
3     True
4    False
Name: email, dtype: bool

print (USERS[~USERS.email.isin(EXCLUDE.email)])
     email
1  [email protected]
2  [email protected]
3  [email protected]


Another solution with merge:

另一个解决方案merge

df = pd.merge(USERS, EXCLUDE, how='outer', indicator=True)
print (df)
     email     _merge
0  [email protected]       both
1  [email protected]  left_only
2  [email protected]  left_only
3  [email protected]  left_only
4  [email protected]       both

print (df.loc[df._merge == 'left_only', ['email']])
     email
1  [email protected]
2  [email protected]
3  [email protected]