带有 WHERE 子句的 JOIN 的 Pandas 模拟
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40867877/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas analogue of JOIN with WHERE clause
提问by Keithx
I'm doing joining of two dataframe (A and B) in python's pandas.
我正在 python 的Pandas中加入两个数据框(A 和 B)。
The goal is to receive all the pure rows from B (sql analogue- right join B on A.client_id=B.client_id where A.client_id is null)
目标是从 B 接收所有纯行(sql 模拟-在 A.client_id=B.client_id 上右连接 B,其中 A.client_id 为空)
In pandas all I know for this operation is to do merging but I don't know how to set up the conditions (where clause):
在 Pandas 中,我只知道这个操作是进行合并,但我不知道如何设置条件(where 子句):
x=pd.merge(A,B,how='right',on=['client_id','client_id']
采纳答案by piRSquared
option 1indicator=True
选项1indicator=True
A.merge(B, on='client_id', how='right', indicator=True) \
.query('_merge == "right_only"').drop('_merge', 1)
setup
设置
A = pd.DataFrame(dict(client_id=[1, 2, 3], valueA=[4, 5, 6]))
B = pd.DataFrame(dict(client_id=[3, 4, 5], valueB=[7, 8, 9]))
results
结果
more explanationindicator=True
puts another column in the results of the merge that indicates whether that rows results are from the left, right, or both.
更多解释indicator=True
将另一列放在合并的结果中,指示该行结果是来自左侧、右侧还是两者兼而有之。
A.merge(B, on='client_id', how='outer', indicator=True)
So, I just use query
to filter out the right_only
indicator then drop that column.
所以,我只是query
用来过滤right_only
指标然后删除该列。
option 2
not really a merge. You can use query
again to only pull rows of B
where its 'client_id'
s are not in A
选项 2
并不是真正的合并。您可以query
再次使用仅拉出B
其'client_id'
s 不在的行A
B.query('client_id not in @A.client_id')
or an equivalent way of saying the same thing (but faster)
或者说同样的事情的等价方式(但更快)
B[~B.client_id.isin(A.client_id)]
回答by Quickbeam2k1
For me, this is also a bit unsatisfying, but I think the recommended way is something like:
对我来说,这也有点不尽如人意,但我认为推荐的方式是这样的:
x = pd.merge(A[A["client_ID"].isnull()], B,
how='right', on=['client_id', 'client_id'])
More information can be found in the pandas documentation
更多信息可以在Pandas文档中找到
Additionally, you might use something like A.where(A["client_ID"].isnull())
for filtering. Also, note my mistake in the previous version. I was comparing to None
but you should use the isnull()
function
此外,您可能会使用类似的东西A.where(A["client_ID"].isnull())
进行过滤。另外,请注意我在以前版本中的错误。我正在比较,None
但您应该使用该isnull()
功能