python pandas：如何在一个数据框中而不是在另一个数据框中查找行？

Question

提问by Pythonista anonymous

Let's say that I have two tables: people_alland people_usa, both with the same structure and therefore the same primary key.

假设我有两个表：people_alland people_usa，它们具有相同的结构，因此具有相同的主键。

How can I get a table of the people not in the USA? In SQL I'd do something like:

我怎样才能得到一张不在美国的人的桌子？在 SQL 中，我会执行以下操作：

select a.*
from people_all a

left outer join people_usa u
on a.id = u.id

where u.id is null

What would be the Python equivalent? I cannot think of a way to translate this where statement into pandas syntax.

Python 的等价物是什么？我想不出一种方法将这个 where 语句翻译成 Pandas 语法。

The only way I can think of is to add an arbitrary field to people_usa (e.g. people_usa['dummy']=1), do a left join, then take only the records where 'dummy' is nan, then delete the dummy field - which seems a bit convoluted.

我能想到的唯一方法是向 people_usa（例如people_usa['dummy']=1）添加一个任意字段，进行左连接，然后仅获取“虚拟”为 nan 的记录，然后删除虚拟字段 - 这似乎有点令人费解。

Thanks!

谢谢！

Answer 1

回答by EdChum

use isinand negate the boolean mask:

使用isin和否定布尔掩码：

people_usa[~people_usa['ID'].isin(people_all ['ID'])]

Example:

例子：

In [364]:
people_all = pd.DataFrame({ 'ID' : np.arange(5)})
people_usa = pd.DataFrame({ 'ID' : [3,4,6,7,100]})
people_usa[~people_usa['ID'].isin(people_all['ID'])]

Out[364]:
    ID
2    6
3    7
4  100

so 3 and 4 are removed from the result, the boolean mask looks like this:

因此从结果中删除了 3 和 4，布尔掩码如下所示：

In [366]:
people_usa['ID'].isin(people_all['ID'])

Out[366]:
0     True
1     True
2    False
3    False
4    False
Name: ID, dtype: bool

using ~inverts the mask

使用~反转掩码

Answer 2

回答by MaxU

Here is another similar to SQL Pandas method: .query():

这是另一个类似于 SQL Pandas 的方法：.query()：

people_all.query('ID not in @people_usa.ID')

or using NumPy's in1d()method:

或使用 NumPy 的in1d()方法：

people_all.[~np.in1d(people_all, people_usa)]

NOTE: for those who have experience with SQL it might be worth to read Pandas comparison with SQL

注意：对于那些有 SQL 经验的人来说，阅读Pandas 与 SQL 的比较可能是值得的

Answer 3

回答by Graham Streich

I would combine (by stacking) the data frames and then perform a .drop_duplicates method. Documentation found here:

我会组合（通过堆叠）数据帧，然后执行 .drop_duplicates 方法。文档在这里找到：

http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.drop_duplicates.html

python pandas：如何在一个数据框中而不是在另一个数据框中查找行？

提问by Pythonista anonymous

回答by EdChum

回答by MaxU

回答by Graham Streich

相关推荐

最近更新

标签

python pandas：如何在一个数据框中而不是在另一个数据框中查找行？

提问by Pythonista anonymous

回答by EdChum

回答by MaxU

回答by Graham Streich

相关推荐

Python Pandas 从 Groupby 中选择组的随机样本

pandas 对数据框中的所有值求和

pandas 将字符串转换为数据框

pandas 计算熊猫行中真/假值的数量

相关推荐

最近更新

标签