带有 WHERE 子句的 JOIN 的 Pandas 模拟

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40867877/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:32:28  来源:igfitidea点击:

Pandas analogue of JOIN with WHERE clause

pythonsqlpandas

提问by Keithx

I'm doing joining of two dataframe (A and B) in python's pandas.

我正在 python 的Pandas中加入两个数据框(A 和 B)。

The goal is to receive all the pure rows from B (sql analogue- right join B on A.client_id=B.client_id where A.client_id is null)

目标是从 B 接收所有纯行(sql 模拟-在 A.client_id=B.client_id 上右连接 B,其中 A.client_id 为空)

In pandas all I know for this operation is to do merging but I don't know how to set up the conditions (where clause):

在 Pandas 中,我只知道这个操作是进行合并,但我不知道如何设置条件(where 子句):

x=pd.merge(A,B,how='right',on=['client_id','client_id']

采纳答案by piRSquared

option 1
indicator=True

选项1
indicator=True

A.merge(B, on='client_id', how='right', indicator=True) \
    .query('_merge == "right_only"').drop('_merge', 1)

setup

设置

A = pd.DataFrame(dict(client_id=[1, 2, 3], valueA=[4, 5, 6]))
B = pd.DataFrame(dict(client_id=[3, 4, 5], valueB=[7, 8, 9]))

results

结果

enter image description here

在此处输入图片说明

more explanation
indicator=Trueputs another column in the results of the merge that indicates whether that rows results are from the left, right, or both.

更多解释
indicator=True将另一列放在合并的结果中,指示该行结果是来自左侧、右侧还是两者兼而有之。

A.merge(B, on='client_id', how='outer', indicator=True)

enter image description here

在此处输入图片说明

So, I just use queryto filter out the right_onlyindicator then drop that column.

所以,我只是query用来过滤right_only指标然后删除该列。



option 2
not really a merge. You can use queryagain to only pull rows of Bwhere its 'client_id's are not in A

选项 2
并不是真正的合并。您可以query再次使用仅拉出B'client_id's 不在的行A

B.query('client_id not in @A.client_id')

or an equivalent way of saying the same thing (but faster)

或者说同样的事情的等价方式(但更快)

B[~B.client_id.isin(A.client_id)]

enter image description here

在此处输入图片说明

回答by Quickbeam2k1

For me, this is also a bit unsatisfying, but I think the recommended way is something like:

对我来说,这也有点不尽如人意,但我认为推荐的方式是这样的:

x = pd.merge(A[A["client_ID"].isnull()], B, 
             how='right', on=['client_id', 'client_id'])

More information can be found in the pandas documentation

更多信息可以在Pandas文档中找到

Additionally, you might use something like A.where(A["client_ID"].isnull())for filtering. Also, note my mistake in the previous version. I was comparing to Nonebut you should use the isnull()function

此外,您可能会使用类似的东西A.where(A["client_ID"].isnull())进行过滤。另外,请注意我在以前版本中的错误。我正在比较,None但您应该使用该isnull()功能