pandas 如何在python pandas中实现左外连接?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38184554/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to implement left outer join in python pandas?
提问by marupav
I have been trying to implement left outer join in python.I see that there is slight difference between left join and left outer join.
我一直在尝试在python中实现左外连接。我看到左连接和左外连接之间存在细微差别。
As in this link : LEFT JOIN vs. LEFT OUTER JOIN in SQL Server
如此链接所示:SQL Server 中的 LEFT JOIN 与 LEFT OUTER JOIN
I could get my hands on below with sample examples:
我可以通过以下示例进行操作:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'],
'value1': np.random.randn(4)})
df2 = pd.DataFrame({'key': ['B', 'D', 'D', 'E'],
'value2': np.random.randn(4)})
df3 = df1.merge(df2, on=['key'], how='left')
This gives records from df1 in total (including the intersected ones)
这总共给出了来自 df1 的记录(包括相交的记录)
But how do I do the left outer join which has only records from df1 which are not in df2?
但是我如何做只有来自 df1 而不在 df2 中的记录的左外连接?
Not: This is example only.I might have large number of columns (different) in either dataframes.
不是:这只是示例。我可能在任一数据框中都有大量列(不同)。
Please help.
请帮忙。
采纳答案by EdChum
set param indicator=True
, this will add a column _merge
you then filter just the rows that are left_only
:
设置 param indicator=True
,这将添加一列_merge
,然后您只过滤以下行left_only
:
In [46]:
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'],
'value1': np.random.randn(4)})
?
df2 = pd.DataFrame({'key': ['B', 'D', 'D', 'E'],
'value2': np.random.randn(4)})
?
df3 = df1.merge(df2, on=['key'], how='left', indicator=True)
df3
Out[46]:
key value1 value2 _merge
0 A -0.346861 NaN left_only
1 B 1.120739 0.558272 both
2 C 0.023881 NaN left_only
3 D -0.598771 -0.823035 both
4 D -0.598771 0.369423 both
In [48]:
df3[df3['_merge'] == 'left_only']
Out[48]:
key value1 value2 _merge
0 A -0.346861 NaN left_only
2 C 0.023881 NaN left_only
if on older version then use isin
with ~
to negate the mask:
如果在旧版本上,则使用isin
with~
来否定掩码:
In [50]:
df3[~df3['key'].isin(df2['key'])]
Out[50]:
key value1 value2
0 A -0.346861 NaN
2 C 0.023881 NaN